Commit graph

40 commits

Author SHA1 Message Date
Erico Nunes
d939f5d463 lima: fix nir shader memory leak
Fix memory leak on allocation for nir shader, reported by valgrind.

3,502 (480 direct, 3,022 indirect) bytes in 1 blocks are definitely lost in loss record 77 of 84
   at 0x48483F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
   by 0x5750817: ralloc_size (ralloc.c:119)
   by 0x5750977: rzalloc_size (ralloc.c:151)
   by 0x575C173: nir_shader_create (nir.c:45)
   by 0x5763ACB: nir_shader_clone (nir_clone.c:728)
   by 0x55D5003: st_create_fp_variant (st_program.c:1242)
   by 0x55D789F: st_get_fp_variant (st_program.c:1522)
   by 0x55D789F: st_get_fp_variant (st_program.c:1507)
   by 0x56400C3: st_update_fp (st_atom_shader.c:163)
   by 0x563D333: st_validate_state (st_atom.c:261)
   by 0x55D07CB: prepare_draw (st_draw.c:132)
   by 0x55D08DF: st_draw_vbo (st_draw.c:184)
   by 0x55576CB: _mesa_draw_arrays (draw.c:374)
   by 0x55576CB: _mesa_draw_arrays (draw.c:351)

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-11-07 23:03:01 +00:00
Vasily Khoruzhick
65a5b24aee lima: add support for gl_PointSize
GP handles gl_PointSize similar to gl_Position, i.e. it needs
separate buffer and it has special type in varying descriptors, also
for indexed draw we need to emit special PLBU command to pass
address of gl_PointSize buffer.

Blob also clamps gl_PointSize to 1 .. 100 (as well as line width),
so let's do the same.

Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-11-05 17:44:56 -08:00
Vasily Khoruzhick
6dd0ad66de lima/ppir: add NIR pass to split varying loads
NIR may emit a single instrinsic to load several packed varyings,
but that's suboptimal for Utgard PP for several reasons:
- varyings that are used as sampler inputs can be passed using
  pipeline register with increased precision
- we have small number of regs, so using a vec4 regs for storing
  two vec2 varyings increases reg pressure.

Add NIR pass to split a single load into several loads and utilize
it in lima.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-09-26 18:51:10 -07:00
Vasily Khoruzhick
d214778753 lima: implement BO cache
Allocating BOs is expensive, so we should avoid doing that by caching
freed BOs.

BO cache is modelled after one in v3d driver and works as follows:

- in lima_bo_create() check if we have matching BO in cache and return
  it if there's one, allocate new BO otherwise.
- in lima_bo_unreference() (renamed from lima_bo_free()): put BO in
  cache instead of freeing it and remove all stale BOs from cache

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-09-22 19:20:59 -07:00
Vasily Khoruzhick
576341324d lima: run opt_algebraic between int_to_float and boot_to_float for vs
int_to_float emits ftrunc and ftrunc lowering generates bool ops.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-09-09 10:25:30 -07:00
Vasily Khoruzhick
e6dbf6d948 lima/gpir: lower fceil
GP doesn't support fceil so we need to lower it.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-09-09 10:25:30 -07:00
Vasily Khoruzhick
aa77fc309a lima/ppir: don't lower phis to scalar
Utgard PP is vec4 architecture, so lowering phis to scalars
increases instruction count and potentially interferes with
spilling.

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-09-05 19:29:16 -07:00
Vasily Khoruzhick
517b60dc13 lima/ppir: don't lower vector {b,f}csel to scalar if condition is scalar
Utgard PP has vector fcsel operation, but its condition is scalar. Add
filtering callback that checks whether {b,f}csel condition is not scalar
to lower {b,f}csel to scalar only in this case.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-09-06 01:51:28 +00:00
Vasily Khoruzhick
9367d2ca37 nir: allow specifying filter callback in lower_alu_to_scalar
Set of opcodes doesn't have enough flexibility in certain cases. E.g.
Utgard PP has vector conditional select operation, but condition is always
scalar. Lowering all the vector selects to scalar increases instruction
number, so we need a way to filter only those ops that can't be handled
in hardware.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-09-06 01:51:28 +00:00
Erico Nunes
4379dcc12d lima/ppir: enable vectorize optimization
pp has vector units and some operations can be optimized when bundled
together.
Benchmarking this with piglit shaders shows that the instruction count
can be greatly reduced on many examples with vectorize.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-08-25 18:29:12 +00:00
Erico Nunes
2a8a81d109 lima/ppir: lower selects to scalars
nir vec4 fcsel assumes that each component of the condition will be used
to select the same component from the options, but pp can't implement
that since it only has 1 component for the condition.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-08-25 18:29:12 +00:00
Erico Nunes
27e7603c34 lima: fix ppir spill stack allocation
The previous spill stack was fixed and too small, and caused instability
in programs requiring spilling for roughly more than one value.
This patch adds a dynamic calculation of the buffer size based on stack
utilization and switches it to a separate allocation at flush time that
will fit the shader that requires the largest buffer.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-08-25 20:08:59 +02:00
Vasily Khoruzhick
5adfc8602c lima/ppir: move sin/cos input scaling into NIR
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-06 17:49:22 +00:00
Erico Nunes
e0aeee9460 lima: add summary report for shader-db
Very basic summary, loops and gpir spills:fills are not updated yet and
are only there to comply with the strings to shader-db report.py regex.

For now it can be used to analyze the impact of changes in instruction
count in both gpir and ppir.

The LIMA_DEBUG=shaderdb setting can be useful to output stats on
applications other than shader-db.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-08-06 15:43:31 +02:00
Erico Nunes
360bda0b1d lima/ppir: enable lower_vector_cmp to lower fall_equal
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-05 23:36:46 +02:00
Erico Nunes
9e8f8dbcd1 lima: re-run nir_opt_algebraic after int lowering
nir_lower_int_to_float is currently only meant to run once, and some ops
must be lowered after being converted from int ops to be implementable,
so re-run nir_opt_algebraic after lowering ints to floats.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-05 23:36:35 +02:00
Vasily Khoruzhick
c780af7771 lima/ppir: move alu vec to scalar lowering into NIR
Utgard PP is vec4, but some operations are scalar, utilize
NIR vec to scalar lowering pass and indicate operations that we
want to lower.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-04 02:17:12 +00:00
Erico Nunes
82bf5a8aac lima: enable lower_bitops in ppir
The mali pp doesn't support integers and some nir_algebraic
optimizations may result in ops that are not easily lowerable to floats,
so disable optimizations resulting in bitops.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-07-31 23:06:26 +02:00
Erico Nunes
b3676a6548 nir/algebraic: rename lower_bitshift to lower_bitops
Optimizations that insert bitshift or bitwise operations should not be
applied on GPUs that don't support integer operations.
The .lower_bitshift could be used to control the bitshift related ones,
but there was also one bitwise optimization uncovered.
Since only lima and freedreno use this option and the use case is that
no bit operations are wanted, let's rename it to .lower_bitops and use
it to control all bitops related optimizations.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-07-31 23:06:04 +02:00
Erico Nunes
99c956fb47 lima/ppir: lower fdot in nir_opt_algebraic
Now that we have fsum in nir, we can move fdot lowering there.
This helps reduce ppir complexity and enables the lowered ops to be part
of other nir optimizations in the optimization loop.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-31 21:35:58 +02:00
Erico Nunes
d2901de09e lima/ppir: lower texture projection
Lower texture projection in ppir using nir_lower_tex and nir_lower_tex.
This will insert a mul with the coordinate division before the load
varying.

Even though the lima pp supports projection in the load varying
instruction while loading the coordinates (from a register or a
varying), it requires that both the coordinates and projector be
components in a single register.
nir currently handles them in separate ssa, and attempting to merge them
manually may end up in worse code than just doing the coordinate
division manually. So for now let's just lower the projection to add
support for it in lima.
In the future, an optimization pass may be implemented in lima to ensure
that both coords and projector come in the same register, then this
lowering may be disabled and in this case lima may use the built-in
projection and save the mul instruction from lowering.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-31 21:22:41 +02:00
Connor Abbott
af95f80a24 lima/gp: Clean up lima_program_optimize_vs_nir() a little
Remove an unnecessary nir_lower_regs_to_ssa as that should be done by
the state tracker, and add a missing DCE pass after running copy
propagation in order to remove the dead copies. This shouldn't fix
anything but the second part will reduce shader sizes.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-28 23:38:31 +02:00
Jonathan Marek
bc3b6168ba nir: replace lower_sincos with algebraic opt
This version has less ops for the same precision.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2019-07-24 17:36:21 -04:00
Connor Abbott
cc78a42577 lima: Reintroduce the standalone compiler
I used this to test things without needing to have a device handy.

Acked-by: Qiang Yu <yuq825@gmail.com>
2019-07-18 14:33:23 +02:00
Sagar Ghuge
456557a837 nir: Add lower_rotate flag and set to true in all drivers
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Andreas Baierl
0cb9ce12fd lima/ppir: lower ffma in ppir
Since we cannot handle ffma in ppir, lower it on nir level already.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-06-24 11:57:57 +00:00
Erico Nunes
d72bbb2c89 lima: lower fmod in ppir and gpir
Since commit 4f3c82c72c fmod is no longer being lowered in nir, and
ends up crashing lima programs with "unsupported nir_op: fmod" in both
ppir and gpir.
There seems to be no mod operation in hardware in utgard and there is an
optimization in nir to lower fmod to instructions that lima already
implements, so let's use that.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-06-16 10:11:59 +00:00
Jonathan Marek
f387c2b238 nir: remove bool lowering from lower_int_to_float
Removes the bool_to_float logic from the int_to_float pass, so that both
can be used separately. By having separate passes we have better validation
and it makes it possible to use with the lower_ftrunc option (int lowering
generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should
probably be reworked). For now we always expect lower_bool to come after
lower_int.

Also fixes f2i32 to become ftrunc and adds u2f/f2u cases.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Jonathan Marek
f889180ee1 nir: add lower_bitshift option
Add a "lower_bitshift" option, which disables optimizations introducing
bitshifts and lowers ishl by constant to a multiply, so that we don't have
to deal with bitshifts in int_to_float lowering.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Qiang Yu
a1d419603f lima/gpir: switch to use nir_lower_viewport_transform
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
2019-05-20 10:57:11 +08:00
Jonathan Marek
d0bff89159 nir: allow specifying a set of opcodes in lower_alu_to_scalar
This can be used by both etnaviv and freedreno/a2xx as they are both vec4
architectures with some instructions being scalar-only.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-10 15:10:41 +00:00
Vasily Khoruzhick
6b46399e2f lima: enable sin and cos lowering for GP
GP doesn't support sin/cos natively, so we have to lower them.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 15:25:21 +00:00
Christian Gmeiner
4e110eca42 nir: nir_shader_compiler_options: drop native_integers
Driver which do not support native integers should use a lowering
pass to go from integers to floats.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 07:35:52 +02:00
Vasily Khoruzhick
d4a249aa09 lima/gpir: enable lowering for ftrunc
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 01:07:27 +00:00
Vasily Khoruzhick
cf1ab4b96b lima: use int_to_float lowering pass
Neither GP nor PP in Mali4x0 support integers, so utilize new pass
and set native_integers to true for now until this flag is dropped.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 01:07:27 +00:00
Andreas Baierl
c960323a81 lima/ppir: Add gl_FragCoord handling
Treat gl_FragCoord variable as a system value and lower the w component
with a nir pass.
Add the necessary bits for correct codegen.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-04-29 02:46:44 +00:00
Erico Nunes
2288b59ddc lima: enable nir fsign lowering in ppir
The mali utgard pp doesn't support a sign instruction.
Use the nir lowering function for fsign to implement fsign in ppir.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-04-19 15:42:23 +00:00
Karol Herbst
a55c7352d6 lima: add bool parameter to type_size function
Fixes: 035759b61b
       ("nir/i965/freedreno/vc4: add a bindless bool to type size functions")

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-04-12 17:08:53 +02:00
Icenowy Zheng
400f0bfba1 lima: lower bool to float when building shaders
Both processors of Mali Utgard are float-only, so bool are not
acceptable data type of them. Fortunately the NIR compiler
infrastructure has a lower pass to lower bool to float.

Call this lower pass to lower bool to float for both GP and PP. This
makes Glamor on Xorg server 1.20.3 at least doesn't hang when starting
gtk3-demo.

The old map of nir op bcsel is changed to fcsel, and the map of b2f32 in
PP is dropped because it's not needed now (it's originally only mapped
to ppir_op_mov).

Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-04-12 13:40:47 +08:00
Qiang Yu
92d7ca4b1c gallium: add lima driver
v2:
- use renamed util_dynarray_grow_cap
- use DEBUG_GET_ONCE_FLAGS_OPTION for debug flags
- remove DRM_FORMAT_MOD_ARM_AGTB_MODE0 usage
- compute min/max index in driver

v3:
- fix plbu framebuffer state calculation
- fix color_16pc assemble
- use nir_lower_all_source_mods for lowering neg/abs/sat
- use float arrary for static GPU data
- add disassemble comment for static shader code
- use drm_find_modifier

v4:
- use lima_nir_lower_uniform_to_scalar

v5:
- remove nir_opt_global_to_local when rebase

Cc: Rob Clark <robdclark@gmail.com>
Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Arno Messiaen <arnomessiaen@gmail.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Koen Kooi <koen@dominion.thruhere.net>
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: marmeladema <xademax@gmail.com>
Signed-off-by: Paweł Chmiel <pawel.mikolaj.chmiel@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Rohan Garg <rohan@garg.io>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
2019-04-11 09:57:53 +08:00