Commit graph

64076 commits

Author SHA1 Message Date
Rob Clark
db193e5ad0 freedreno/ir3: split out shader compiler from a3xx
Move the bits we want to share between generations from fd3_program to
ir3_shader.  So overall structure is:

  fdN_shader_stateobj -> ir3_shader -> ir3_shader_variant -> ir3
                                    |- ...
                                    \- ir3_shader_variant -> ir3

So the ir3_shader becomes the topmost generation neutral object, which
manages the set of variants each of which generates, compiles, and
assembles it's own ir.

There is a bit of additional renaming to s/fd3_compiler/ir3_compiler/,
etc.

Keep the split between the gallium level stateobj and the shader helper
object because it might be a good idea to pre-compute some generation
specific register values (ie. anything that is independent of linking).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-07-25 13:29:28 -04:00
Rob Clark
7d7e6ae9c3 freedreno/a3xx/compiler: rename ir3_shader to ir3
First step of reoganization split out compiler (so it can be shared
between a3xx and a4xx).  Rename ir3_shader -> ir3 (since we'll want
the name ir3_shader for a higher level object).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-07-25 13:29:28 -04:00
Rob Clark
faaeddb55e freedreno/a3xx/compiler: scheduler vs pred reg
The scheduler also needs to be aware of predicate register (p0) in
addition to address register (a0).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-07-25 13:29:28 -04:00
Rob Clark
9f391322a0 freedreno/a3xx/compiler: little cleanups
Remove some obsolete comments, rename deref->addr.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-07-25 13:29:28 -04:00
Rob Clark
d48faad3c2 freedreno/a3xx: enable/disable wa's based on patch-level
It seems like for the most part, different behaviors, workarounds, etc,
should be conditional on GPU patch revision (ie. a320.0 vs a320.2)
rather than GPU id (a320 vs a330).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-07-25 13:29:28 -04:00
Rob Clark
9613ca569f freedreno/a3xx/compiler: make IR heap dyanmic
The fixed size heap is a remnant of the fdre-a3xx assembler.  Yet it is
convenient for being able to free the entire data structure in one shot
without worrying about leaking nodes.

Change it to dynamically grow the heap size (adding chunks) as needed so
we don't have an artificial upper limit on shader size (other than hw
limits) and don't always have to allocate worst-case size.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-07-25 13:29:28 -04:00
Jan Vesely
0bc1fa22d8 r600g/compute: Fix singed/unsigned comparison compiler warnings.
The iteration variables go from 0 anyway.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-07-25 12:55:05 -04:00
Tom Stellard
0ec8587642 clover: Query the device to see if images are supported
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2014-07-25 12:49:45 -04:00
Tom Stellard
1607a8efc1 gallium: Add PIPE_CAP_COMPUTE_IMAGES_SUPPORTED
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2014-07-25 12:49:20 -04:00
Bruno Jiménez
d6b89aef26 r600g/compute: Allow compute_memory_defrag to defragment between resources
This will be used in the following patch to avoid duplicated code

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-07-25 12:38:42 -04:00
Bruno Jiménez
5cf108078c r600g/compute: Allow compute_memory_move_item to move items between resources
v2: Remove unnecesary variables

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-07-25 12:38:28 -04:00
Dylan Baker
bf1247936a gbm: Search LIBGL_DRIVERS_PATH if GBM_DRIVERS_PATH is not set
The GBM_DRIVERS_PATH environment variable is not documented, and only
used to set the location of gbm drivers, while LIBGL_DRIVERS_PATH is
used for everything else, and is documented.

Generally this split leads to confusion as to why gbm doesn't work.

This patch will read LIBGL_DRIVERS_PATH as a fallback if
GBM_DRIVERS_PATH is not set.

The comments clearly indicate that using LIBGL_DRIVERS_PATH is
preferred over GBM_DRIVERS_PATH.

v2: - Use GBM_DRIVERS_PATH as a fallback
v3: [jordan.l.justen@intel.com] - Make LIBGL_DRIVERS_PATH the fallback

Signed-off-by: Dylan Baker <baker.dylan.c@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2014-07-24 23:15:06 -07:00
Jerome Glisse
cce58147eb winsys/radeon: fix indentation
Can we please keep it clean and avoid ending up in messy situation
like ddx.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
2014-07-24 17:30:31 -04:00
Jason Ekstrand
989d2e3709 Add an accelerated version of F_TO_I for x86_64
According to a quick micro-benchmark, this new version is 20% faster on my
Haswell laptop.

v2: Removed the XXX note about x86_64 from the comment
v3: Use an intrinsic instead of an __asm__ block.  This should give us MSVC
    support for free.
v4: Enable it for all x86_64 builds, not just with USE_X86_64_ASM

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-07-24 12:44:56 -07:00
Matt Turner
2a33510f16 i965/fs: Decide predicate/predicate_inverse outside of the for loop.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-07-24 11:27:44 -07:00
Matt Turner
96128d134b i965/fs: Swap if/else conditions in SEL peephole.
Will clarify make the next commit easier to read.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-07-24 11:27:44 -07:00
Matt Turner
ac2acf04f7 i965: Improve dead control flow elimination.
... to eliminate an ELSE instruction followed immediately by an ENDIF.

instructions in affected programs:     704 -> 700 (-0.57%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-07-24 11:27:43 -07:00
Ilia Mirkin
0ddc28b026 nvc0/ir: support 2d constbuf indexing
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-07-24 08:26:42 -04:00
Ilia Mirkin
4eef537960 gm107/ir: emit LDC subops
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-07-24 08:26:42 -04:00
Ilia Mirkin
fc3d5fe01d gk110/ir: emit load constant subop
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-07-24 08:26:41 -04:00
Ilia Mirkin
9c4959d0df mesa/st: add support for interpolate_at_* ops
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-07-24 08:26:41 -04:00
Ilia Mirkin
dfb0ca1606 nv50/ir: fix phi/union sources when their def has been merged
In a situation where double-register values are used, the phi nodes can
still end up being u32 values. They all get merged into one RA node
though. When fixing up the merge (which comes after the phi node), the
phi node's def would get fixed, but not its sources which would remain
at the low register value.

This maintains the invariant that a phi node's defs and sources are
allocated the same register.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-07-24 08:26:41 -04:00
Ilia Mirkin
32702cceed nv50/ir: fix hard-coded TYPE_U32 sized register
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-07-24 08:26:41 -04:00
Ilia Mirkin
3f6b34bacc nvc0: mark shader header if fp64 is used
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-07-24 08:26:41 -04:00
Ilia Mirkin
b21a28797c nv50/ir: keep track of whether the program uses fp64
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-07-24 08:26:41 -04:00
Ilia Mirkin
47e5a8d7a2 nvc0: make sure that the local memory allocation is aligned to 0x10
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2014-07-24 08:26:41 -04:00
Ilia Mirkin
637b6c2478 mesa: add ARB_clear_texture.xml to file list, remove duplicate decls
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2014-07-24 08:26:41 -04:00
Chia-I Wu
9d6166880d ilo: check the tilings of imported handles
Just to be cautious.
2014-07-24 13:38:51 +08:00
Chia-I Wu
cbc943c43e ilo: clean up resource bo renaming
s/alloc_bo/rename_bo/ as that is what the functions do.  Simplify bo
allocation and move the complexity to bo renaming.
2014-07-24 13:21:35 +08:00
Chia-I Wu
cf8c9947a8 ilo: share some code between {tex,buf}_create_bo
Add resource_get_bo_name() and resource_get_bo_initial_domain() for use by
both functions.
2014-07-24 10:49:02 +08:00
Chia-I Wu
c1a1a627c4 ilo: use native 3-component vertex formats on GEN7.5+
GEN7.5 gains support for those formats natively.
2014-07-24 09:54:20 +08:00
Chia-I Wu
2126541b0b ilo: allow for device-dependent format translation
Pass ilo_dev_info to all format translation functions.
2014-07-24 09:33:33 +08:00
Jason Ekstrand
6bac86cd85 i965: Accelerate uploads of RGBA and BGRA GL_UNSIGNED_INT_8_8_8_8_REV textures
Since intel is always going to be little-endian,
GL_UNSIGNED_INT_8_8_8_8_REV is the same as GL_UNSIGNED_BYTE for RGBA and
BGRA textures, so the same acceleration code will work.  We might as well
use it.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-07-23 16:48:35 -07:00
Ian Romanick
5072d0e7fc mesa: Fix the name in the error message
Obvious copy-and-paste bug.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-07-23 16:42:47 -07:00
Ian Romanick
3f04a1532e glsl: Fix some bad indentation
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-07-23 16:42:47 -07:00
Kenneth Graunke
d4d886a0bc i965/fs: Set LastRT on the final FB write on Broadwell.
In Piglit's EXT_framebuffer_multisample/alpha-to-coverage-dual-src-blend
test, key->nr_color_regions == 2, but the dual source blend FB write has
ir->target set to 0.  So we failed to set "Last Render Target Select" on
any FB write message.

We only emit one FB write per render target, so my comment about setting
LastRT on every FB write directed at the last color region is a bit...
misinformed.  According to the documentation, depth buffer writes and
scoreboard updates happen on the FB write with LastRT set, so I believe
we want to set it only once.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
2014-07-23 15:44:37 -07:00
Kenneth Graunke
36a4a6bbdc i965: Port INTEL_DEBUG=optimizer to the vec4 backend.
Largely via copy and paste.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-07-23 15:44:16 -07:00
Kenneth Graunke
8d2e95bd4b i965: Save the gl_shader_stage enum in backend_visitor.
This will be useful for INTEL_DEBUG=optimizer in the vec4 backend, which
needs to know whether it's currently processing a VS or GS.  It isn't
worth adding virtual methods for this case.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-07-23 15:44:14 -07:00
Kenneth Graunke
d6d3e6027d i965: Don't print WE_normal in disassembly.
Dropping this helps most lines fit in an 80 column terminal.  The
absence of WE_normal also helps call attention to WE_all, where
something unusual is going on.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-07-23 15:44:08 -07:00
Rob Clark
2f181bc391 freedreno/a3xx/compiler: fix p0 (kill, etc)
Don't assert (debug builds) or assign random uninitialized value for
predicate register (p0).. that screws up kill, etc.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-07-23 15:10:53 -04:00
Tom Stellard
fb237ba746 Revert "r600g/compute: Fix warnings"
This reverts commit 467f1585e2.

This breaks the build on some systems.
2014-07-23 11:52:05 -04:00
Grigori Goronzy
2a766b0b64 radeon/llvm: fix formatting
Use K&R and same indent as most other code. No functional change
intended.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-07-23 10:40:41 -04:00
Grigori Goronzy
0e9cdedd2e radeon/llvm: enable unsafe math for graphics shaders
Accuracy of some operations was recently improved in the R600 backend,
at the cost of slower code. This is required for compute shaders,
but not for graphics shaders. Add unsafe-fp-math hint to make LLVM
generate faster but possibly less accurate code.

Piglit didn't indicate any regressions.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-07-23 10:40:33 -04:00
Tom Stellard
467f1585e2 r600g/compute: Fix warnings 2014-07-23 10:29:17 -04:00
Glenn Kennard
2fa6d659c3 r600g: Use hardware sqrt instruction
Piglit quick tests including sqrt pass, no other regressions,
tested on radeon 6670.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2014-07-23 10:29:17 -04:00
Bruno Jiménez
dbaf0bc388 r600g/compute: Remove unneeded code from compute_memory_promote_item
Now that we know that the pool is defragmented, we positively know
that allocated + unallocated will be the total size of the
current pool plus all the items that will be promoted. So we only
need to grow the pool once.

This will allow us to just add the new items to the end of the
item_list without the need of looking for a place to the new item.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-07-23 10:29:17 -04:00
Bruno Jiménez
e7bda844e6 r600g/compute: Quick exit if there's nothing to add to the pool
This way we can avoid defragmenting the pool, even if it is needed
to defragment it, and looping again through the list of unallocated
items.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-07-23 10:29:17 -04:00
Bruno Jiménez
90d7b09ed2 r600g/compute: Defrag the pool if it's necesary
This patch adds a new member to the pool to track its status.
For now it is used only for the 'fragmented' status, but if
needed it could be used for more statuses.

The pool will be considered fragmented if: An item that isn't
the last is freed or demoted.

This 'strategy' has a problem, although it shouldn't cause any bug.
If for example we have two items, A and B. We choose to free A first,
now the pool will have the 'fragmented' status. If we now free B,
the pool will retain its 'fragmented' status even if it isn't
fragmented.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-07-23 10:29:17 -04:00
Bruno Jiménez
d8b6f0dacb r600g/compute: Add a function for defragmenting the pool
This new function will move items forward in the pool, so that
there's no gap between them, effectively defragmenting the pool.

For now this function is a bit dumb as it just moves items
forward without trying to see if other items in the pool could
fit in the gaps.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-07-23 10:29:17 -04:00
Bruno Jiménez
1f705b2bee r600g/compute: Add a function for moving items in the pool
This function will be used in the future by compute_memory_defrag
to move items forward in the pool.

It does so by first checking for overlaping ranges, if the ranges
don't overlap it will copy the contents directly. If they overlap
it will try first to make a temporary buffer, if this buffer fails
to allocate, it will finally fall back to a mapping.

Note that it will only be needed to move items forward, it only
checks for overlapping ranges in that case. If needed, it can
easily be added by changing the first if.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-07-23 10:29:17 -04:00