Commit graph

38010 commits

Author SHA1 Message Date
Vinson Lee
be99100ee7 util: Silence uninitialized variable warnings. 2010-08-21 15:49:17 -07:00
Kenneth Graunke
e511a35fc5 glsl: Handle array declarations in function parameters.
The 'vec4[12] foo' style already worked, but the 'vec4 foo[12]' style
did not.  Also, 'vec4[] foo' was wrongly accepted.

Fixes piglit test cases array-19.vert and array-21.vert.

May fix fd.o bug #29684 (or at least part of it).
2010-08-21 15:42:27 -07:00
Luca Barbieri
4edeeaf715 nvfx: actually fix it properly 2010-08-21 23:53:39 +02:00
Luca Barbieri
251e48c64a nvfx: fix incorrect assert 2010-08-21 23:45:32 +02:00
Vinson Lee
4a6eb492e8 util: Move loop variable declaration outside for loop.
Fixes build error with MSVC.
2010-08-21 14:36:29 -07:00
Vinson Lee
489c787b80 nvfx: Fix SCons build.
Move declarations before code.
Fix void pointer arithmetic.
2010-08-21 14:29:50 -07:00
Luca Barbieri
11d27871a7 nvfx: fix warnings 2010-08-21 23:09:43 +02:00
José Fonseca
0d96cbe4a5 gallivm: Emit DIVPS instead of RCPPS.
See comments for detailed rationale.

Thanks to Michal Krol and Zack Rusin for detecting and investigating this
in detail.
2010-08-21 21:58:22 +01:00
Luca Barbieri
42210f4464 nvfx: enable translate_sse 2010-08-21 21:56:29 +02:00
Vinson Lee
15d558c306 auxiliary: Add missing files to SCons build.
Add u_linear.c and u_linkages.c to SCons build.
Reorder list of files to be more alphabetical.
2010-08-21 12:32:17 -07:00
Vinson Lee
683118ccf2 auxiliary: Reorder list of files in Makefile.
This patch reorders the list of files so that the order is more alphabetic.
2010-08-21 12:21:59 -07:00
Vinson Lee
1badd3c43f scons: Fix nvfx build. 2010-08-21 12:00:57 -07:00
Luca Barbieri
d8e210eb11 nvfx: slightly improve handling of overlong vps 2010-08-21 20:42:15 +02:00
Luca Barbieri
5eddf95be9 nvfx: tweak CMP in fp 2010-08-21 20:42:15 +02:00
Luca Barbieri
8983621c6b nvfx: implement CMP in vp 2010-08-21 20:42:15 +02:00
Luca Barbieri
923f5c97b1 nvfx: implement TXL in fp 2010-08-21 20:42:15 +02:00
Luca Barbieri
847ac88671 nvfx: implement SSG in fp 2010-08-21 20:42:15 +02:00
Luca Barbieri
32d2525d64 nvfx: implement DP2 in vp and fp 2010-08-21 20:42:15 +02:00
Luca Barbieri
4aec8aa2e3 nvfx: implement TRUNC in vp and fp 2010-08-21 20:42:15 +02:00
Luca Barbieri
587d26fdf9 nvfx: implement NOP 2010-08-21 20:42:15 +02:00
Luca Barbieri
fe3c62dd77 nvfx: add vertex program control flow 2010-08-21 20:42:15 +02:00
Luca Barbieri
5287d86a0b nvfx: fix vertex shader headers 2010-08-21 20:42:15 +02:00
Luca Barbieri
af4a6eba55 nv40: add fragment program control flow 2010-08-21 20:42:15 +02:00
Luca Barbieri
cf0d156422 nvfx: refactor shader assembler 2010-08-21 20:42:15 +02:00
Luca Barbieri
28fa9451e1 nvfx: add option to dump shaders in TGSI and native code 2010-08-21 20:42:15 +02:00
Luca Barbieri
b2bad53478 nvfx: improve and correct nvfx_shader.h 2010-08-21 20:42:15 +02:00
Luca Barbieri
928cce672a nvfx: fix lodbias 2010-08-21 20:42:15 +02:00
Luca Barbieri
1dea9bc369 nvfx: mostly fix inline corruption magically
Not sure why this mostly works.
2010-08-21 20:42:15 +02:00
Luca Barbieri
ed232adc80 nvfx: fix GPU hardlocks when depth buffer is absent 2010-08-21 20:42:14 +02:00
Luca Barbieri
6931a01222 nvfx: fire ring after transfers
Might reduce the risk of running out of memory
2010-08-21 20:42:14 +02:00
Luca Barbieri
07b6fde410 nv30: band-aid viewport issues
For some reason nv30 seems to like to reset the viewport, even though
attempts to isolate where exactly it does that have currently been
inconclusive.
2010-08-21 20:42:14 +02:00
Luca Barbieri
0d74956a1f nvfx: support flatshade_first 2010-08-21 20:42:14 +02:00
Luca Barbieri
0184e29863 nvfx: expose GLSL
Still no control flow support, but basic stuff works.
2010-08-21 20:42:14 +02:00
Luca Barbieri
4d765f7fa3 nvfx: support proper shader linkage - adds glsl support 2010-08-21 20:42:14 +02:00
Luca Barbieri
8eb0fc430a nvfx: rewrite draw code and buffer code
This is a full rewrite of the drawing and buffer management logic.

It offers a lot of improvements:
1. A copy of buffers is now always kept in system memory. This is
   necessary to allow software processing of them, which is necessary
   or improves performance in many cases.
2. Support for pushing vertices on the FIFO, with index lookup if necessary.
3. "Smart" draw code that tries to intelligently choose the cheapest
  way to draw something: whether to use inline vertices or hardware
  vertex buffer, and whether to use hardware index buffers
4. Support for all vertex formats supported by the hardware
5. Usage of translate to push vertices, supporting all formats that are
   sensible to use as vertex formats
6. Support for base vertex
7. Usage of Ben Skeggs' primitive splitter originally for nv50, allowing
   correct splitting of line loops, triangle fans, etc.
8. Support for instancing
9. Precomputation using the vertex elements CSO

Thanks to Ben Skeggs for his primitive splitter originally for nv50.

Thanks to Christoph Bumiller for his nv50 push code, that was the basis
of this work, even though I changed his code dramatically, in particular
to replace his ad-hoc vertex data emitter with translate.

The changes could also go into nv50 too, but there are substantial
differences due to the additional nv50 hardware features.
2010-08-21 20:42:14 +02:00
Luca Barbieri
73b7c6fb33 nvfx: refactor sampling code, add support for swizzles and depth tex
This is a significant refactoring of the sampling code that:
- Moves all generic functions in nvfx_fragtex.c
- Adds a driver-specific sampler view structure and uses it to
  precompute texture setup as it should be done
- Unifies a bit more of code between nv30 and nv40
- Adds support for sampler view swizzles
- Support for specifying as sampler view format different from the
  resource one (only trivially)
- Support for sampler view specification of first and last level
- Support for depth textures on nv30, both for reading depth and
  for compare
- Support for sRGB textures
- Unifies the format table between nv30 and nv40
- Expands the format table to include essentially all supportable formats
  except mixed sign and "autonormal" formats
- Fixes the "is format supported" logic, which was quite broken, and
  makes it use the format table

Only tested on nv30 currently.
2010-08-21 20:42:14 +02:00
Luca Barbieri
4e2080a86e nvfx: new 2D: unify textures and buffers
Stop using the vtbl, and use real transfers for buffers too.
2010-08-21 20:42:14 +02:00
Luca Barbieri
0481ed25c9 nvfx: new 2D: use a CPU copy for up to 4 pixels, up from 0
Seems a reasonable threshold for now.

Significantly speeds up Piglit's 1x1 glReadPixels (but, you know,
reading pixels in 1x1 blocks is NOT a good idea, especially if you
might be running on a less-than-perfect driver).
2010-08-21 20:42:14 +02:00
Luca Barbieri
28eb392a85 nvfx: new 2D: new render temporaries with resources
This patch adds support for creating temporary surfaces to allow
rendering to surfaces that cannot be rendered to.
It uses the _second_ version of the render temporary infrastructure.

This is necessary for swizzled 3D textures and small mipmaps of
swizzled 2D textures.

This version of the patch creates a resource to use as a temporary
instead of a raw BO, making the code simpler.
2010-08-21 20:42:14 +02:00
Luca Barbieri
ff74143fcc nv30: new 2D: support ARB_texture_rectangle
This uses nv30's _RECT formats.
2010-08-21 20:42:14 +02:00
Luca Barbieri
4793f48a19 nvfx: new 2D: optimize fragtex format lookup
Use an array indexed by the pipe format instead of doing a linear scan.
2010-08-21 20:42:14 +02:00
Luca Barbieri
d983701267 nvfx: new 2D: enable swizzling for all surfaces
Now that the new 2D code is in place, swizzling can be safely enabled.

Render temporaries are needed in some cases, so this may degrade nv30
a bit until it gets render temporaries too.
2010-08-21 20:42:14 +02:00
Luca Barbieri
9ed0686e8e nvfx: new 2D: use new 2D engine in Gallium
This patch implements nv04_surface_copy/fill using the new 2D engine module.
It supports falling back to the 3D engine using the u_blitter module, which will be
added in a later patch.

Also adds support for using the 3D engine, reusing the u_blitter module
created for r300.
This is used for unswizzling and copies between swizzled surfaces.
2010-08-21 20:42:14 +02:00
Luca Barbieri
24a4ea003f nv04-nv40: new 2D: add new Gallium-independent 2D engine
This patch add a brand new nv04-nv40 2D engine module.
It should correctly implement all operations involving swizzled, and 3D-swizzled surfaces.

This code is independent from the Gallium framework and can thus be reused in the DDX and classic Mesa drivers (it's only likely to be useful in the latter, though).

Currently, surface_copy and surface_fill are broken for 3D textures, for swizzled source textures and possibly for some misaligned cases

The code is based around the new nv04_region structure, which encapsulates the information from pipe_surface needed for the 2D engine and CPU copies.
The use of nv04_region makes the code independent of the Gallium framework and allows to transform the nv04_region without clobbering the nv04_region.
The existing M2MF, blitter, and SWIZZLED_SURFACE paths have been improved and a new CPU path has been added.
There is also support to tell the caller to use the 3D engine.

The main feature of the copy/fill setup algorithm is linearization/contiguous-linearization of swizzled surfaces.
The idea of linearization is that some swizzled surfaces are laid out like linear ones (1xN, 2xN, Nx1) and can thus be used as such (e.g. useful for copying single pixels).
Also, some rectangles (e.g. the whole surface) are contiguous in memory. If both the source and destination rectangles are swizzled but contiguous, then they can be regarded as both linear: this is the idea of "contiguous linearization".
This, for instance, allows to use the 2D engine to duplicate the content of a swizzled surface to another swizzled surface, by pretending they are actually linear.
After linearization, the result may not be 64-byte aligned. Another transformation is done to enlarge the linear surface so that it becomes 64-byte aligned.
This is also used to 64-byte align swizzled texture mipmaps.

The inner loop of the CPU path is as optimized as possible without using SSE/SSE2.
Future improvements could include SSE/SSE2 support, and possibly a faster coordinate swizzling algorithm (which is however not used in the inner loop).
It may be a good idea to autogenerate swizzling code at least for all possible POT 2D texture dimensions  (less than 256), maybe for all 3D ones too (less than 4096).
Also, it woud be a very good idea to make a copy with the GPU first if the source surface is in uncached memory.
2010-08-21 20:42:14 +02:00
Luca Barbieri
23639dc046 nvfx: new 2D: rewrite transfer code to use staging transfers
This greatly simplifies the code, and avoids ad-hoc copy code.

Also, these new transfers work for buffers too, even though they
are still used for miptrees only.
2010-08-21 20:42:14 +02:00
Luca Barbieri
ed2930e7e2 nvfx: new 2D: rewrite miptree code, adapt transfers
Changes:
- Disable swizzling on non-RGBA 2D textures, since the current 2D
  code is mostly broken in those cases. A later patch will fix this.
  Thanks to Andrew Randrianasulu who reported this.
- Fix compressed texture transfers and hack around the current 2D
  code inability to copy compressed textures by using direct access.
  Thanks to Andrew Randrianasulu who reported this.

This patch rewrites all the miptree layout and transfer code in the
nvfx driver.

The current code is broken in several ways:
1. 3D textures are laid out first by face, then by level, which is
incorrect
2. Cube maps should have 128-byte aligned faces
3. Swizzled textures have a strange alignment test that seems
unnecessary
4. We store the image_offsets for each face/slice but they can be
easily computed instead
5. "Swizzling" is not supported for compressed formats. They can be
"swizzled" but swizzling only means that there are no gaps (pitch is
level-dependant) and the layout is still linear
6. Swizzling is not supported for non-RGBA formats. All formats (except
possibly depth) can be swizzled according to my testing.

The miptree layout is rewritten based on my empirical testing, which I
posted in the "miptree findings" mail.
The image_offset array is removed, since it can be calculated with a
simple multiplication; the only array in the miptree structure is now
the one for mipmap level starts, which it seems cannot be easily
computed in constant time.

Also, we now directly store a nouveau_bo instead of a pipe_buffer in
the miptree structure, like nv50 does.

Support for render temporaries is removed, and will be readded in a
later patch.

Note that the current temporary code is broken, because it does not
copy the temporary back on render cache flushes.
2010-08-21 20:42:14 +02:00
Luca Barbieri
ac97e8dba6 nvfx: add nouveau_resource_on_gpu
Add a function to get whether a resource is likely on the GPU or not.

Currently always returns TRUE.
2010-08-21 20:42:14 +02:00
Luca Barbieri
37fa0cf4ea nvfx: add linear flag for buffers 2010-08-21 20:42:14 +02:00
Luca Barbieri
6a73d99a52 nvfx: properly unreference bound objects on context destruction 2010-08-21 20:42:13 +02:00
Luca Barbieri
e189823eb4 nvfx: reference count bound objects 2010-08-21 20:42:13 +02:00