mesa/src
Roland Scheidegger 8aa168eb8f llvmpipe: use vector loads for (optimized) tri raster funcs
When we switched to 64bit rasterization, we could no longer use straight
aligned loads for loading the plane data. However, what the code actually
does for loading 3 planes, is 12 scalar loads + 9 unpacks, and then there's
another 8 unpacks for the transpose we need (!).

It would be possible to do the (scalar) loads of course already transposed
(at least saving the additional unpacks), however instead just use
(un)aligned vector loads, and recalculate the eo values, which is much less
instructions (note in case of the triangle_32_3_4 case, the eo values are
not even used, making the scalar loads + unpacks for them all the more
pointless).

This drops execution time of the triangle_32_3_4 function considerably,
albeit it doesn't really make a measurable difference (for small tris we're
essentially limited by vertex throughput in any case), for triangle_32_3_16
it's essentially noise (the loop is more costly than the initial code there).

(I'm thinking about just ditching storing the eo values in the plane data,
so could switch back to using aligned planes, however right now they are
still used in the other raster functions dealing with planes with scalar
code. Also not touching the ppc code, might not be that bad there in any
case.)

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-02 05:58:19 +01:00
..
compiler nir: Add lowering support for unpacking opcodes. 2016-02-01 10:43:57 -08:00
egl egl/dri2: expose srgb configs when KHR_gl_colorspace is available 2016-01-22 11:55:54 +00:00
gallium llvmpipe: use vector loads for (optimized) tri raster funcs 2016-02-02 05:58:19 +01:00
gbm gbm.h: Add a missing stddef.h include for size_t. 2015-10-30 19:12:14 +00:00
getopt
glx glx/dri3: a drawable might not be bound at wait time 2015-12-21 06:43:58 -05:00
gtest
hgl
loader virtio_gpu: Add PCI ID to driver map 2016-01-23 12:35:24 +10:00
mapi glapi: add GL_OES_geometry_shader extension 2016-01-22 17:13:55 +01:00
mesa i965: Provide sse2 version for rgba8 <-> bgra8 swizzle 2016-02-02 05:58:19 +01:00
util ralloc: Fix ralloc_adopt() to the old context's last child's parent. 2015-12-18 23:30:51 -08:00
Makefile.am glsl: move to compiler/ 2016-01-26 16:08:33 +00:00
SConscript glsl: move to compiler/ 2016-01-26 16:08:33 +00:00