mesa/src/gallium
Marek Olšák 4a10d6154e radeonsi: move instance divisors into a constant buffer
Shader key size: 107 -> 47

Divisors of 0 and 1 are encoded in the shader key. Greater instance divisors
are loaded from a constant buffer.

The shader code doing the division is huge. Is it something we need to
worry about? Does any app use instance divisors >= 2?

VS prolog disassembly:
    s_load_dwordx4 s[12:15], s[0:1], 0x80  ; C00A0300 00000080
    s_nop 0                                ; BF800000
    s_waitcnt lgkmcnt(0)                   ; BF8C007F
    s_buffer_load_dword s14, s[12:15], 0x4 ; C0220386 00000004
    s_waitcnt lgkmcnt(0)                   ; BF8C007F
    v_cvt_f32_u32_e32 v4, s14              ; 7E080C0E
    v_rcp_iflag_f32_e32 v4, v4             ; 7E084704
    v_mul_f32_e32 v4, 0x4f800000, v4       ; 0A0808FF 4F800000
    v_cvt_u32_f32_e32 v4, v4               ; 7E080F04
    v_mul_hi_u32 v5, v4, s14               ; D2860005 00001D04
    v_mul_lo_i32 v6, v4, s14               ; D2850006 00001D04
    v_cmp_eq_u32_e64 s[12:13], 0, v5       ; D0CA000C 00020A80
    v_sub_i32_e32 v5, vcc, 0, v6           ; 340A0C80
    v_cndmask_b32_e64 v5, v6, v5, s[12:13] ; D1000005 00320B06
    v_mul_hi_u32 v5, v5, v4                ; D2860005 00020905
    v_add_i32_e32 v6, vcc, v5, v4          ; 320C0905
    v_subrev_i32_e32 v4, vcc, v5, v4       ; 36080905
    v_cndmask_b32_e64 v4, v4, v6, s[12:13] ; D1000004 00320D04
    v_mul_hi_u32 v5, v4, v1                ; D2860005 00020304
    v_add_i32_e32 v4, vcc, s8, v0          ; 32080008
    v_mul_lo_i32 v6, v5, s14               ; D2850006 00001D05
    v_add_i32_e32 v7, vcc, 1, v5           ; 320E0A81
    v_cmp_ge_u32_e64 s[12:13], v1, v6      ; D0CE000C 00020D01
    v_sub_i32_e32 v6, vcc, v1, v6          ; 340C0D01
    v_cmp_le_u32_e32 vcc, s14, v6          ; 7D960C0E
    v_cndmask_b32_e64 v8, 0, -1, s[12:13]  ; D1000008 00318280
    v_cndmask_b32_e64 v6, 0, -1, vcc       ; D1000006 01A98280
    v_and_b32_e32 v6, v8, v6               ; 260C0D08
    v_cmp_eq_u32_e32 vcc, 0, v6            ; 7D940C80
    v_cndmask_b32_e32 v6, v7, v5, vcc      ; 000C0B07
    v_add_i32_e32 v5, vcc, -1, v5          ; 320A0AC1
    v_cmp_eq_u32_e32 vcc, 0, v8            ; 7D941080
    v_cndmask_b32_e32 v5, v6, v5, vcc      ; 000A0B06
    v_add_i32_e32 v5, vcc, s9, v5          ; 320A0A09

v2: set prefer_mono for fetched instance divisors

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-27 19:55:09 +02:00
..
auxiliary pipe_loader_sw: fix compilation warning 2017-06-27 07:49:02 -06:00
docs gallium/docs: improve docs for SAMPLE_POS, SAMPLE_INFO, TXQS, MSAA semantics 2017-06-16 14:07:31 -06:00
drivers radeonsi: move instance divisors into a constant buffer 2017-06-27 19:55:09 +02:00
include mesa/glthread: add glthread "perf" counters and pass them to gallium HUD 2017-06-26 02:17:03 +02:00
state_trackers mesa/glthread: add glthread "perf" counters and pass them to gallium HUD 2017-06-26 02:17:03 +02:00
targets i915: use different CFLAGS/LIBS variables than i965/anv 2017-06-27 14:10:29 +03:00
tests gallium: allow passing 'unsigned flags' to create_screen() 2017-06-23 19:50:20 +02:00
tools gallium/tools: use correct shebang for python scripts 2017-03-10 14:12:47 +00:00
winsys i915: use different CFLAGS/LIBS variables than i965/anv 2017-06-27 14:10:29 +03:00
Android.common.mk Android: rework LLVM build support 2017-05-11 13:52:21 +01:00
Android.mk gallium: Add renderonly-based support for pl111+vc4. 2017-06-15 11:41:22 -07:00
Automake.inc gallium/util: libunwind support 2017-04-03 11:32:17 -04:00
Makefile.am gallium: Add renderonly-based support for pl111+vc4. 2017-06-15 11:41:22 -07:00
README.portability
SConscript gallium: swr: Added swr build for windows 2016-11-21 12:44:47 -06:00

	      CROSS-PLATFORM PORTABILITY GUIDELINES FOR GALLIUM3D 


= General Considerations =

The state tracker and winsys driver support a rather limited number of
platforms. However, the pipe drivers are meant to run in a wide number of
platforms. Hence the pipe drivers, the auxiliary modules, and all public
headers in general, should strictly follow these guidelines to ensure


= Compiler Support =

* Include the p_compiler.h.

* Cast explicitly when converting to integer types of smaller sizes.

* Cast explicitly when converting between float, double and integral types.

* Don't use named struct initializers.

* Don't use variable number of macro arguments. Use static inline functions
instead.

* Don't use C99 features.

= Standard Library =

* Avoid including standard library headers. Most standard library functions are
not available in Windows Kernel Mode. Use the appropriate p_*.h include.

== Memory Allocation ==

* Use MALLOC, CALLOC, FREE instead of the malloc, calloc, free functions.

* Use align_pointer() function defined in u_memory.h for aligning pointers
 in a portable way.

== Debugging ==

* Use the functions/macros in p_debug.h.

* Don't include assert.h, call abort, printf, etc.


= Code Style =

== Inherantice in C ==

The main thing we do is mimic inheritance by structure containment.

Here's a silly made-up example:

/* base class */
struct buffer
{
  int size;
  void (*validate)(struct buffer *buf);
};

/* sub-class of bufffer */
struct texture_buffer
{
  struct buffer base;  /* the base class, MUST COME FIRST! */
  int format;
  int width, height;
};


Then, we'll typically have cast-wrapper functions to convert base-class 
pointers to sub-class pointers where needed:

static inline struct vertex_buffer *vertex_buffer(struct buffer *buf)
{
  return (struct vertex_buffer *) buf;
}


To create/init a sub-classed object:

struct buffer *create_texture_buffer(int w, int h, int format)
{
  struct texture_buffer *t = malloc(sizeof(*t));
  t->format = format;
  t->width = w;
  t->height = h;
  t->base.size = w * h;
  t->base.validate = tex_validate;
  return &t->base;
}

Example sub-class method:

void tex_validate(struct buffer *buf)
{
  struct texture_buffer *tb = texture_buffer(buf);
  assert(tb->format);
  assert(tb->width);
  assert(tb->height);
}


Note that we typically do not use typedefs to make "class names"; we use
'struct whatever' everywhere.

Gallium's pipe_context and the subclassed psb_context, etc are prime examples 
of this.  There's also many examples in Mesa and the Mesa state tracker.