Commit graph

243 commits

Author SHA1 Message Date
Dave Airlie
f205e19e4f radv/ac: eliminate unused vertex shader outputs. (v2)
This is ported from radeonsi, and I can see at least one
Talos shader drops an export due to this, and saves some
VGPR usage.

v2: use shared code.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-27 05:18:52 +01:00
Dave Airlie
e2659176ce radeonsi/ac: move vertex export remove to common code.
This code can be shared by radv, we bump the max to
VARYING_SLOT_MAX here, but that shouldn't have too
much fallout.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-27 05:17:47 +01:00
Dave Airlie
7f77554b5b radv/ac: setup mrt exports then export them in one go. (v2)
Noticed while looking at Sascha Willems deferred shaders.

This is a bit of an llvm workaround, llvm was producing this:
        v_cvt_pkrtz_f16_f32_e64 v4, v7, v8                       ; D2960004 00021107
        v_cvt_pkrtz_f16_f32_e64 v6, v9, 1.0                      ; D2960006 0001E509
        s_waitcnt vmcnt(0)                                       ; BF8C0F70
        exp mrt0 v4, v4, v6, v6 compr                            ; C400040F 00000604
        s_waitcnt expcnt(0)                                      ; BF8C0F0F
        v_cvt_pkrtz_f16_f32_e64 v4, v12, v5                      ; D2960004 00020B0C
        v_cvt_pkrtz_f16_f32_e64 v5, v14, 1.0                     ; D2960005 0001E50E
        exp mrt1 v4, v4, v5, v5 compr                            ; C400041F 00000504
        s_waitcnt expcnt(0)                                      ; BF8C0F0F
        v_cvt_pkrtz_f16_f32_e64 v0, v0, v1                       ; D2960000 00020300
        v_cvt_pkrtz_f16_f32_e64 v1, v2, v3                       ; D2960001 00020702
        exp mrt2 v0, v0, v1, v1 done compr vm                    ; C4001C2F 00000100

After this change:
        v_cvt_pkrtz_f16_f32_e64 v4, v7, v8                       ; D2960004 00021107
        s_waitcnt vmcnt(0)                                       ; BF8C0F70
        v_cvt_pkrtz_f16_f32_e64 v0, v0, v1                       ; D2960000 00020300
        v_cvt_pkrtz_f16_f32_e64 v6, v9, 1.0                      ; D2960006 0001E509
        v_cvt_pkrtz_f16_f32_e64 v5, v12, v5                      ; D2960005 00020B0C
        v_cvt_pkrtz_f16_f32_e64 v7, v14, 1.0                     ; D2960007 0001E50E
        exp mrt0 v4, v4, v6, v6 compr                            ; C400040F 00000604
        v_cvt_pkrtz_f16_f32_e64 v1, v2, v3                       ; D2960001 00020702
        exp mrt1 v5, v5, v7, v7 compr                            ; C400041F 00000705
        exp mrt2 v0, v0, v1, v1 done compr vm                    ; C4001C2F 00000100

No waitcnt for exports are emitted.

v2: fixup index->mrt mapping (Bas).

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-25 23:26:11 +01:00
Dave Airlie
b2cedb3ea9 radv/ac: overhaul vs output/ps input routing
In order to cleanly eliminate exports rewrite the
code first to mirror how radeonsi works for now.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-25 23:24:39 +01:00
Dave Airlie
fed740eafe radv/ac: copy llvm machine feature flags from radeonsi.
This just updates this to use the same flags as radeonsi
for consistency.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-24 05:55:44 +01:00
Dave Airlie
35ea0c07a1 radv/ac: use tex_lz if we can.
Looking at some Talos shaders vs radeonsi, I noticed they use
tex_lz in a few places, so we should be able to as well.

Reviewed-by: Bas Nieuwenhuizen <basni@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-20 22:00:13 +01:00
Christoph Haag
a9d27c8a33 ac: fix build after LLVM 5.0 SVN r300718
v2: previously getWithDereferenceableBytes() exists, but addAttr() doesn't take that type

Signed-off-by: Christoph Haag <haagch+mesadev@frickel.club>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-and-reviewed-by: Mike Lothian <mike@fireburn.co.uk>
2017-04-20 10:58:19 +02:00
Mike Lothian
709ed1fa9f radv/ac: Fix nir.h include
This fixes the build after:

commit 224cf2906a
Author: Dave Airlie <airlied@redhat.com>
Date:   Mon Apr 17 13:01:52 2017 +1000

    radv/ac: add initial pre-pass for shader info gathering

Signed-off-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 12:25:18 +10:00
Dave Airlie
60a93e11ba radv: drop debugging leftovers code in descriptor set patches.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 09:31:14 +10:00
Dave Airlie
25a5ee391d radv/ac: add support for indirect access of descriptor sets.
We want to expose more descriptor sets to the applications,
but currently we have a 1:1 mapping between shader descriptor
sets and 2 user sgprs, limiting us to 4 per stage. This commit
check if we don't have enough user sgprs for the number of
bound sets for this shader, we can ask for them to be indirected.

Two sgprs are then used to point to a buffer or 64-bit pointers
to the number of allocated descriptor sets. All shaders point
to the same buffer.

We can use some user sgprs to inline one or two descriptor sets
in future, but until we have a workload that needs this I don't
 think we should spend too much time on it.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 09:00:43 +10:00
Dave Airlie
d0991b135b radv: start allocating user sgprs
This adds an initial implementation to allocate the user
sgprs and make sure we don't run out if we try to bind
a bunch of descriptor sets.

This can be enhanced further in the future if we add
support for inlining push constants.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 09:00:43 +10:00
Dave Airlie
4087eaecd0 radv/ac: mark used descriptor sets in shader info.
This pre calculates the used descriptor sets.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 09:00:43 +10:00
Dave Airlie
0b62669c8d radv/ac: frag shader only needs ring offsets if sample positions enabled
mostly documenting things, since with modern llvm we always have the
spill enabled.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 09:00:42 +10:00
Dave Airlie
ec4785afb7 radv/ac: move needs_push_constants to shader info.
First step to optimising push constants.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 09:00:42 +10:00
Dave Airlie
ec15e0d301 radv: optimise compute shader grid size emission.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 09:00:42 +10:00
Dave Airlie
31174069d2 radv: start conditionalising vertex inputs. (v2)
In practice this will probably just drop draw id in a few places.

v2: just do draw_id for now. (Bas)
it might be possible to do something more if we need it in the
future. (nha)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 09:00:42 +10:00
Dave Airlie
224cf2906a radv/ac: add initial pre-pass for shader info gathering
There is some radv specific info we need to gather from shaders
before we get into converting nir->llvm, so we can make
better decisions especially around user sgpr allocation.

This is just an initial placeholder to gather if sample positions
are required in the frag shader.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-19 09:00:42 +10:00
Bas Nieuwenhuizen
bd91caf863 radv: Use an offset instead of pointers for immutable samplers.
Makes more sense when we hash the layout for the pipeline cache.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-12 07:43:25 +02:00
Samuel Pitoiset
a1c37ff9e4 ac: add unreachable() in ac_build_image_opcode()
To silent the following compiler warning:

common/ac_llvm_build.c: In function ‘ac_build_image_opcode’:
common/ac_llvm_build.c:1080:3: warning: ‘name’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   snprintf(intr_name, sizeof(intr_name), "%s%s%s%s.v4f32.%s.v8i32",
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    name,
    ~~~~~
    a->compare ? ".c" : "",
    ~~~~~~~~~~~~~~~~~~~~~~~
    a->bias ? ".b" :
    ~~~~~~~~~~~~~~~~
    a->lod ? ".l" :
    ~~~~~~~~~~~~~~~
    a->deriv ? ".d" :
    ~~~~~~~~~~~~~~~~~
    a->level_zero ? ".lz" : "",
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~
    a->offset ? ".o" : "",
    ~~~~~~~~~~~~~~~~~~~~~~
    type);
    ~~~~~

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-10 23:02:12 +02:00
Dave Airlie
22b116171f radv: fix interp at sample code.
Interp at sample needs to use the center, since the sample
positions it retrieves are relative to the center.

This fixes a bunch of CTS tests with multisample_interpolation.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-04 05:55:21 +10:00
Dave Airlie
1171b304f3 radv: overhaul fragment shader sample positions.
The current code was broken, and I decided to redesign it instead.

This puts the sample positions for all samples into the queue
constant descriptor buffer after all the spill/ring descriptors.

It then uses a single offset register to point how far into the
samples the samples for num_samples are. This saves one user sgpr
and means we only generate the sample position data in the rare
single case where we need it currently.

This doesn't fix the failing CTS tests without the followup
fix.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-04 05:55:15 +10:00
Dave Airlie
1e9e747d00 radv/ac: fix texture derivative ordering
The ordering NIR gives us is correct for the hw, this fixes:
dEQP-VK.glsl.texture_functions.texturegrad.* (mainly trigged
on isampler/usampler 3d textures.).

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-04 05:39:10 +10:00
Dave Airlie
303d22f319 radv/ac: round cube array coordinate before fixup.
This fixes:
dEQP-VK.glsl.texture_functions.texture.samplercubearray*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-04 05:39:07 +10:00
Dave Airlie
5821f676ee radv: move to using common buffer load format.
Get rid of usage of SI.vs.load.input.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-04 05:37:52 +10:00
Dave Airlie
cb1518e96b radv/ac: setup lds for tessellation
This seems to get lost in the rebases, should fix
the tessellation demos, crash in llvm.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:17:15 +10:00
Dave Airlie
aaabdd6bc6 radv/ac: handle writing out tess factors.
This ports the code from radeonsi to build the if/endif,
and ports the tess factor emission code. This code has
an optimisation TODO that we can deal with later.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:16:47 +10:00
Dave Airlie
94f9591995 radv/ac: add support for TCS/TES inputs/outputs.
This adds support for the tessellation inputs/outputs to the
shader compiler, this is one of the main pieces of the patch.

It is very similiar to the radeonsi code (post merge we should
consider if there are better sharing opportunities). The main
differences from radeonsi, is that we can have "compact" varyings
for clip/cull/tess factors, and we have to add special handling
for these.

This consists of treating the const index from the deref different
depending on the compactness.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:16:42 +10:00
Dave Airlie
5ab1289b48 radv/ac: add clip support for tess eval shader.
As this may be the last shader to emit clip distances.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:16:37 +10:00
Dave Airlie
326b9bc6dc radv/ac: hook up tessellation intrinsics.
This just adds support for the nir intrinsics that tessellation uses.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:16:32 +10:00
Dave Airlie
d8ab71b207 radv/ac: hook up shader information handling for tessellation
This hooks up the tessellation shader info to the nir values
and ctx generated ones.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:16:27 +10:00
Dave Airlie
5b40eab00a radv: add tess ctrl stage barrier workaround for SI.
This just ports the workaround from radeonsi.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:16:04 +10:00
Dave Airlie
3a633cc2cb radv/ac: add support for patch inputs to unique index code.
This add support for tessellation patch inputs to the code
that finds the unique parameter index.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:15:57 +10:00
Dave Airlie
60326a7afc radv/ac: setup tessellation shader inputs.
This just configures all the register inputs for the tessellation
related stages.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:15:41 +10:00
Dave Airlie
3968162751 radv/ac: setup tess rings on compiler side.
This just sets up the necessary pointers on the compiler
side for the rings needed for tessellation.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:15:35 +10:00
Dave Airlie
2b3c4bcc1f radv/ac: add tess changes to shader keys/info
This adds the tess pieces for shader keys and shader info,
it adds the necessary bits to the vertex key/info as well.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:15:22 +10:00
Dave Airlie
a5136a97f7 radv: use defines for ring descriptor offsets.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:15:12 +10:00
Dave Airlie
97e0ff30c0 radv: handle clip dist in es outputs.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:14:53 +10:00
Dave Airlie
6279646306 radv: drop unneeded start
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:14:39 +10:00
Dave Airlie
a58d03a5a2 radv: fixup geometry clip emission since using the geom pass
Fixes: 2b35b60d: radv: move to using nir clip/cull merge pass.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:14:38 +10:00
Marek Olšák
d60f72a9f0 radeonsi/gfx9: image descriptor changes in immutable fields
The border color swizzle logic was copied from Vulkan. It doesn't make any
sense to me, but it passes all piglits except the stencil ones.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
2862300d9e radeonsi/gfx9: init_config changes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
71ad666414 radeonsi/gfx9: CP DMA changes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
ef97cc0cae radeonsi/gfx9: add IB parser support
Both GFX6 and GFX9 fields are printed next to each other in parsed IBs.

The Python script parses both headers like one stream and tries to merge
all definitions.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
68d6d097f1 radeonsi/gfx9: add GFX9 and VEGA10 enums
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
5691e14735 amd: GFX9 packet changes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
ecbdfbeb05 amd: define event types for GFX9
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
00e777b61c amd: add texture format definitions for GFX9
the DATA_FORMAT and NUM_FORMAT fields are the same, but some of the enums
differ, thus add GFX6 and GFX9 suffixes, so that the IB parser can show
enums for both.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
e6c520362d amd: resolve remaining definition conflicts with gfx9d.h
Add _GFX6 and _GFX9 suffixes to conflicting definitions.

sid.h and gfx9d.h can now be included in the same file.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
7e7043c31c amd: normalize register definition formatting
This resolves trivial conflicts with gfx9d.h caused by different formatting.
Some fields are also renamed.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
db04d4ccaa amd: import GFX9 register definitions
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00