Fix memory leak on allocation for nir shader, reported by valgrind.
3,502 (480 direct, 3,022 indirect) bytes in 1 blocks are definitely lost in loss record 77 of 84
at 0x48483F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
by 0x5750817: ralloc_size (ralloc.c:119)
by 0x5750977: rzalloc_size (ralloc.c:151)
by 0x575C173: nir_shader_create (nir.c:45)
by 0x5763ACB: nir_shader_clone (nir_clone.c:728)
by 0x55D5003: st_create_fp_variant (st_program.c:1242)
by 0x55D789F: st_get_fp_variant (st_program.c:1522)
by 0x55D789F: st_get_fp_variant (st_program.c:1507)
by 0x56400C3: st_update_fp (st_atom_shader.c:163)
by 0x563D333: st_validate_state (st_atom.c:261)
by 0x55D07CB: prepare_draw (st_draw.c:132)
by 0x55D08DF: st_draw_vbo (st_draw.c:184)
by 0x55576CB: _mesa_draw_arrays (draw.c:374)
by 0x55576CB: _mesa_draw_arrays (draw.c:351)
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
GP handles gl_PointSize similar to gl_Position, i.e. it needs
separate buffer and it has special type in varying descriptors, also
for indexed draw we need to emit special PLBU command to pass
address of gl_PointSize buffer.
Blob also clamps gl_PointSize to 1 .. 100 (as well as line width),
so let's do the same.
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
NIR may emit a single instrinsic to load several packed varyings,
but that's suboptimal for Utgard PP for several reasons:
- varyings that are used as sampler inputs can be passed using
pipeline register with increased precision
- we have small number of regs, so using a vec4 regs for storing
two vec2 varyings increases reg pressure.
Add NIR pass to split a single load into several loads and utilize
it in lima.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Allocating BOs is expensive, so we should avoid doing that by caching
freed BOs.
BO cache is modelled after one in v3d driver and works as follows:
- in lima_bo_create() check if we have matching BO in cache and return
it if there's one, allocate new BO otherwise.
- in lima_bo_unreference() (renamed from lima_bo_free()): put BO in
cache instead of freeing it and remove all stale BOs from cache
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
GP doesn't support fceil so we need to lower it.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Utgard PP is vec4 architecture, so lowering phis to scalars
increases instruction count and potentially interferes with
spilling.
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Utgard PP has vector fcsel operation, but its condition is scalar. Add
filtering callback that checks whether {b,f}csel condition is not scalar
to lower {b,f}csel to scalar only in this case.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Set of opcodes doesn't have enough flexibility in certain cases. E.g.
Utgard PP has vector conditional select operation, but condition is always
scalar. Lowering all the vector selects to scalar increases instruction
number, so we need a way to filter only those ops that can't be handled
in hardware.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
pp has vector units and some operations can be optimized when bundled
together.
Benchmarking this with piglit shaders shows that the instruction count
can be greatly reduced on many examples with vectorize.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
nir vec4 fcsel assumes that each component of the condition will be used
to select the same component from the options, but pp can't implement
that since it only has 1 component for the condition.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
The previous spill stack was fixed and too small, and caused instability
in programs requiring spilling for roughly more than one value.
This patch adds a dynamic calculation of the buffer size based on stack
utilization and switches it to a separate allocation at flush time that
will fit the shader that requires the largest buffer.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Very basic summary, loops and gpir spills:fills are not updated yet and
are only there to comply with the strings to shader-db report.py regex.
For now it can be used to analyze the impact of changes in instruction
count in both gpir and ppir.
The LIMA_DEBUG=shaderdb setting can be useful to output stats on
applications other than shader-db.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
nir_lower_int_to_float is currently only meant to run once, and some ops
must be lowered after being converted from int ops to be implementable,
so re-run nir_opt_algebraic after lowering ints to floats.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Utgard PP is vec4, but some operations are scalar, utilize
NIR vec to scalar lowering pass and indicate operations that we
want to lower.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
The mali pp doesn't support integers and some nir_algebraic
optimizations may result in ops that are not easily lowerable to floats,
so disable optimizations resulting in bitops.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Optimizations that insert bitshift or bitwise operations should not be
applied on GPUs that don't support integer operations.
The .lower_bitshift could be used to control the bitshift related ones,
but there was also one bitwise optimization uncovered.
Since only lima and freedreno use this option and the use case is that
no bit operations are wanted, let's rename it to .lower_bitops and use
it to control all bitops related optimizations.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Now that we have fsum in nir, we can move fdot lowering there.
This helps reduce ppir complexity and enables the lowered ops to be part
of other nir optimizations in the optimization loop.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Lower texture projection in ppir using nir_lower_tex and nir_lower_tex.
This will insert a mul with the coordinate division before the load
varying.
Even though the lima pp supports projection in the load varying
instruction while loading the coordinates (from a register or a
varying), it requires that both the coordinates and projector be
components in a single register.
nir currently handles them in separate ssa, and attempting to merge them
manually may end up in worse code than just doing the coordinate
division manually. So for now let's just lower the projection to add
support for it in lima.
In the future, an optimization pass may be implemented in lima to ensure
that both coords and projector come in the same register, then this
lowering may be disabled and in this case lima may use the built-in
projection and save the mul instruction from lowering.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Remove an unnecessary nir_lower_regs_to_ssa as that should be done by
the state tracker, and add a missing DCE pass after running copy
propagation in order to remove the dead copies. This shouldn't fix
anything but the second part will reduce shader sizes.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
This version has less ops for the same precision.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Since we cannot handle ffma in ppir, lower it on nir level already.
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Since commit 4f3c82c72c fmod is no longer being lowered in nir, and
ends up crashing lima programs with "unsupported nir_op: fmod" in both
ppir and gpir.
There seems to be no mod operation in hardware in utgard and there is an
optimization in nir to lower fmod to instructions that lima already
implements, so let's use that.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Removes the bool_to_float logic from the int_to_float pass, so that both
can be used separately. By having separate passes we have better validation
and it makes it possible to use with the lower_ftrunc option (int lowering
generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should
probably be reworked). For now we always expect lower_bool to come after
lower_int.
Also fixes f2i32 to become ftrunc and adds u2f/f2u cases.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Add a "lower_bitshift" option, which disables optimizations introducing
bitshifts and lowers ishl by constant to a multiply, so that we don't have
to deal with bitshifts in int_to_float lowering.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This can be used by both etnaviv and freedreno/a2xx as they are both vec4
architectures with some instructions being scalar-only.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
GP doesn't support sin/cos natively, so we have to lower them.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Driver which do not support native integers should use a lowering
pass to go from integers to floats.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Neither GP nor PP in Mali4x0 support integers, so utilize new pass
and set native_integers to true for now until this flag is dropped.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Treat gl_FragCoord variable as a system value and lower the w component
with a nir pass.
Add the necessary bits for correct codegen.
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
The mali utgard pp doesn't support a sign instruction.
Use the nir lowering function for fsign to implement fsign in ppir.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes: 035759b61b
("nir/i965/freedreno/vc4: add a bindless bool to type size functions")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Both processors of Mali Utgard are float-only, so bool are not
acceptable data type of them. Fortunately the NIR compiler
infrastructure has a lower pass to lower bool to float.
Call this lower pass to lower bool to float for both GP and PP. This
makes Glamor on Xorg server 1.20.3 at least doesn't hang when starting
gtk3-demo.
The old map of nir op bcsel is changed to fcsel, and the map of b2f32 in
PP is dropped because it's not needed now (it's originally only mapped
to ppir_op_mov).
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Qiang Yu <yuq825@gmail.com>