mesa/src
Nicolai Hähnle 18616e7551 radeonsi: pack GS output components for each vertex stream contiguously
Note that the memory layout of one vertex stream inside one "item" (= memory
written by one GS wave) on the GSVS ring is:

  t0v0c0 ... t15v0c0 t0v1c0 ... t15v1c0 ... t0vLc0 ... t15vLc0
  t0v0c1 ... t15v0c1 t0v1c1 ... t15v1c1 ... t0vLc1 ... t15vLc1
                        ...
  t0v0cL ... t15v0cL t0v1cL ... t15v1cL ... t0vLcL ... t15vLcL
  t16v0c0 ... t31v0c0 t16v1c0 ... t31v1c0 ... t16vLc0 ... t31vLc0
  t16v0c1 ... t31v0c1 t16v1c1 ... t31v1c1 ... t16vLc1 ... t31vLc1
                        ...
  t16v0cL ... t31v0cL t16v1cL ... t31v1cL ... t16vLcL ... t31vLcL

                        ...

  t48v0c0 ... t63v0c0 t48v1c0 ... t63v1c0 ... t48vLc0 ... t63vLc0
  t48v0c1 ... t63v0c1 t48v1c1 ... t63v1c1 ... t48vLc1 ... t63vLc1
                        ...
  t48v0cL ... t63v0cL t48v1cL ... t63v1cL ... t48vLcL ... t63vLcL

where tNN indicates the thread number, vNN the vertex number (in the order of
EMIT_VERTEX), and cNN the output component (vL and cL are the last vertex and
component, respectively).

The vertex streams are laid out sequentially.

The swizzling by 16 threads is hard-coded in the way the VGT generates the
offset passed into the GS copy shader, and the jump every 16 threads is
calculated from VGT_GSVS_RING_OFFSET_n and VGT_GSVS_RING_ITEMSIZE in a way
that makes it difficult to deviate from this layout (at least that's what
I've experimentally confirmed on VI after first trying to go the simpler
route of just interleaving the vertex streams).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:05:00 +01:00
..
amd radv/ac: some fix maybe-uninitialized warnings 2016-12-10 21:46:56 +01:00
compiler glsl: remember per-component vertex streams for packed varyings 2016-12-12 09:03:47 +01:00
egl egl: add and enable EGL_KHR_config_attribs 2016-12-09 17:36:28 +00:00
gallium radeonsi: pack GS output components for each vertex stream contiguously 2016-12-12 09:05:00 +01:00
gbm gbm: request correct version of the DRI2_FENCE extension 2016-11-22 15:56:44 +00:00
getopt Introduce .editorconfig 2016-08-31 17:06:54 -07:00
glx dri: make use of loader_get_extensions_name(..) helper 2016-11-15 18:15:16 +00:00
gtest Introduce .editorconfig 2016-08-31 17:06:54 -07:00
hgl glapi/hgl: remove the final user of _glapi_check_table() 2016-10-06 15:03:46 +01:00
intel intel/aubinator: fix 32bit shift overflow warning 2016-12-11 20:04:15 +01:00
loader loader: automake: whitespace cleanup 2016-11-21 14:46:40 +00:00
mapi mesa: add missing CONTEXT_ROBUST_ACCESS enum 2016-10-27 07:06:41 +03:00
mesa st/glsl_to_tgsi: plumb the GS output stream qualifier through to TGSI 2016-12-12 09:04:03 +01:00
util util: import CRC32 implementation from gallium 2016-11-22 18:05:51 +01:00
vulkan/wsi vulkan: use STATIC_ASSERT instead of static_assert 2016-12-07 22:32:38 +11:00
Makefile.am amd: flatten amd/common makefile structure 2016-11-15 20:04:37 +00:00
SConscript scons: put the generated git_sha1.h file in top-level src/ directory 2016-06-17 10:33:00 -06:00