Suggested by Jason on a different patch after some comments /
questions by Ilia.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabott0@gmail.com>
We weren't comparing the right number of components when checking
swizzles. Use nir_ssa_alu_instr_num_src_components() to do the right
thing.
No piglit regressions, and no fixes either.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Storing this here is pretty sketchy - I don't know if any driver other
than i965 will want to use it. But this will make it a lot easier to
generate NIR code at link time. We'll probably rework it anyway.
(Ian suggested making nir_assign_var_locations_scalar_direct_first
simply modify the nir_shader's fields, rather than passing pointers
to them. If this stays long term, we should do that. But Jason and
I suspect we'll be reworking this area again in the near future.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
I guess I was looking too much at how lower_system_values worked when
writing lower_idiv.
Since ttn wasn't emitting load_var for sysvals and the only drivers
using lower_idiv were using ttn, I think nothing was broken as a result.
But might as well fix this before it becomes a problem.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Originally you had to have one or the other. But actually I don't want
either. (Or rather I want whatever is the minimum # of instructions.)
TODO: not sure where the best place to insert a check that driver hasn't
set *both* lower_negate and lower_sub?
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Now that we're not generating linker errors, we don't actually modify
this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These should never happen. Plus, NIR passes really shouldn't be
reporting linker errors - this is past link time.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We don't actually need a gl_program struct. We only used it to
translate prog->Target (i.e. GL_VERTEX_PROGRAM) to the gl_shader_stage
(i.e. MESA_SHADER_VERTEX). We may as well just pass that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I want to use this in some code that doesn't currently include mtypes.h.
It seems like a better place for it anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This pass performs a mark and sweep pass over a nir_shader's associated
memory - anything still connected to the program will be kept, and any
dead memory we dropped on the floor will be freed.
The expectation is that this will be called when finished building and
optimizing the shader. However, it's also fine to call it earlier, and
many times, to free up memory earlier.
v2: (feedback from Jason Ekstrand)
- Skip sweeping impl->start_block, as it's already in the CF list.
- Don't sweep SSA defs (they're owned by their defining instruction)
- Don't steal phi sources (they're owned by nir_phi_instr).
- Don't steal tex->src (it's owned by the tex_inst itself)
- Don't sweep dereference chains (top-level dereferences are owned by
the instruction; sub-dereferences are owned by the parent deref).
- Don't sweep sources and destinations (SSA defs are handled as part of
the defining instruction, and registers are handled as part of
function implementations).
- Just steal instructions; don't walk them (no longer required).
v3: (feedback from Jason Ekstrand)
- Steal indirect sources from nir_src/nir_dest.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Jason pointed out that variable dereferences in NIR are really part of
their parent instruction, and should have the same lifetime.
Unlike in GLSL IR, they're not used very often - just for intrinsic
variables, call parameters & return, and indirect samplers for
texturing. Also, nir_deref_var is the top-level concept, and
nir_deref_array/nir_deref_record are child nodes.
This patch attempts to allocate nir_deref_vars out of their parent
instruction, and any sub-dereferences out of their parent deref.
It enforces these restrictions in the validator as well.
This means that freeing an instruction should free its associated
dereference chain as well. The memory sweeper pass can also happily
ignore them.
v2: Rename make_deref to evaluate_deref and make it take a nir_instr *
instead of void *. This involves adding &instr->instr everywhere.
(Requested by Jason Ekstrand.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We can't allocate them out of the nir_ssa_def itself, because it may not
be ralloc'd (for example, nir_dest embeds a nir_ssa_def).
However, allocating them out of the instruction should work.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Phi sources are part of the phi instruction and should have the same
lifetime.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The lifetime of the params array needs to be match the nir_call_instr
itself. So, allocate it using the instruction itself as the context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
These were added in commit f2616e56, presumably in preparation for
translating ARB vp/fp into GLSL IR. That never happened, and neither did
a lowering pass that actually generated these instructions.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Based on the algo from NV50LegalizeSSA::handleDIV() and handleMOD().
See also trans_idiv() in freedreno/ir3/ir3_compiler.c (which was an
adaptation of the nv50 code from Ilia Mirkin).
A python/numpy script which implements the same algorithm (and is
possibly useful for debugging or analysis) can be found here:
http://people.freedesktop.org/~robclark/div-lowering.py
I've tested this on i965 hacked up to insert the idiv lowering pass,
and on freedreno with NIR frontend.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Eric Anholt <eric@anholt.net> (vc4)
v2: discovered that i2b/b2i are also confused
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
In freedreno these get implemented as the matching f* instruction plus a
u2f to convert the result to float 1.0/0.0. But less lines of code to
just let nir_opt_algebraic handle this for us, plus opens up some small
window for other opt passes to improve (ie. if some shader ended up with
both a flt and slt with same src args, for example).
v2: use b2f rather than u2f
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
As of da5ec2a, we allocate instruction sources out of the instruction
itself. When we realloc the texture sources we need to use the right
memory context or ralloc will get angry and assert-fail
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit adds a pass to L1-normalize cube-map coordinates. Some hardware
such as i965 requires that largest cube-map coordinate is +-1. We had a
pass to perform this normalization in GLSL IR but we need it in NIR for
cube maps on ARB programs to work correctly.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v2 (Suggested by Eric):
- Do a vector fabs and split into components later
- Move to core NIR
Reviewed-by: Eric Anholt <eric@anholt.net>
Not much hardware wants them these days, and it might give us a chance to
do CSE or algebraic at the NIR level.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We use nir_ssa_defs for nir_builder args, so this takes a nir_src and
makes one so it can be passed in.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
So far we'd only used nir_builder to build brand new programs. But if
we're doing modifications to instructions (like in a lowering pass), then
we want to generate new stuff before the instruction we're modifying.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The lifetime of the sources array needs to be match the nir_tex_instr
itself. So, allocate it using the instruction itself as the context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
These sets are part of the block, and their lifetime needs to match the
block itself. So, allocate them using the block itself as the context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The lifetime of each register's use/def/if_use sets needs to match the
register itself. So, allocate them using the register itself as the
context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
glsl_to_nir passes in the ir_function's name field; we were copying the
pointer, but not duplicating the memory.
We want to be able to free the linked GLSL IR program after translating
to NIR, so we'll need to create a copy of the function name that the NIR
shader actually owns.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We can just pass a pointer to the list of variables, and reuse the code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
No shader-db changes, probably because they're all removed by the GLSL
compiler optimization added in commit 69ad5fd4.
Reviewed-by: Eric Anholt <eric@anholt.net>
Doesn't work for analogous && cases, because of NaNs.
total instructions in shared programs: 6195712 -> 6194829 (-0.01%)
instructions in affected programs: 42000 -> 41117 (-2.10%)
helped: 403
Reviewed-by: Eric Anholt <eric@anholt.net>