Commit graph

1962 commits

Author SHA1 Message Date
Ilia Mirkin
9c8f017f77 glsl: add a few missing int64 constant propagation cases
Fixes KHR-GL45.shader_ballot_tests.ShaderBallotAvailability, which
causes some silly swizzles to appear, triggering this optimization to
get hit.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
2017-08-18 02:26:16 -04:00
Timothy Arceri
c03eefdf84 glsl: set old ldexp operand to NULL when lowering
This fixes an assert during IR validation in LLVMpipe.

Fixes: e2e2c5abd2 (glsl: calculate number of operands in an expression once)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102274
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2017-08-18 12:07:34 +10:00
Ilia Mirkin
978c4c597a glsl/ast: update rhs in addition to the var's constant_value
We continue in the code to do some more things with the rhs, including
setting a constant initializer. If the type is wrong, this causes some
confusion down the line, leading to assertions. This makes sure that the
rhs processing continues to flow as-if the type was correct to start
with (even though the state has been marked as an error state).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101766
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2017-08-15 22:14:05 -04:00
Timothy Arceri
9d41ec2182 glsl: stop cloning builtin fuctions _mesa_glsl_find_builtin_function()
The cloning was introduced in f81ede4699 to fix a problem with
shaders including IR that was owned by builtins.

However the approach of cloning the whole function each time we
reference a builtin lead to a significant reduction in the GLSL
IR compilers performance.

The previous patch fixes the ownership problem in a more precise
way. So we can now remove this cloning.

Testing on a Ryzen 7 1800X shows a ~15% decreases in compiling the
Deus Ex: Mankind Divided shaders on radeonsi (which take 5min+ on
some machines). Looking just at the GLSL IR compiler the speed up
is ~40%.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-11 15:44:15 +10:00
Timothy Arceri
77f5221233 glsl: pass mem_ctx to constant_expression_value(...) and friends
The main motivation for this is that threaded compilation can fall
over if we were to allocate IR inside constant_expression_value()
when calling it on a builtin. This is because builtins are shared
across the whole OpenGL context.

f81ede4699 worked around the problem by cloning the entire
builtin before constant_expression_value() could be called on
it. However cloning the whole function each time we referenced
it lead to a significant reduction in the GLSL IR compiler
performance. This change along with the following patch
helps fix that performance regression.

Other advantages are that we reduce the number of calls to
ralloc_parent(), and for loop unrolling we free constants after
they are used rather than leaving them hanging around.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-08-11 15:44:08 +10:00
Timothy Arceri
d4f79e995f glsl: use ralloc_str_append() rather than ralloc_asprintf_rewrite_tail()
The Deus Ex: Mankind Divided shaders go from spending ~20 seconds
in the GLSL IR compilers front-end down to ~18.5 seconds on a
Ryzen 1800X.

Tested by compiling once with shader-db then deleting the index file
from the shader cache and compiling again.

v2:
 - fix rebasing issue in v1

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2017-08-11 10:43:34 +10:00
Timothy Arceri
53320e25b4 glsl: remove unused field from ir_call
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-08-11 10:43:27 +10:00
Timothy Arceri
49d9286a3f glsl: stop copying struct and interface member names
We are currently copying the name for each member dereference
but we can just share a single instance of the string provided
by the type.

This change also stops us recalculating the field index
repeatedly.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2017-08-11 10:43:21 +10:00
Timothy Arceri
43cbcbfee9 glsl: tidy up get_num_operands()
Also add a comment that this should only be used by the ir_reader
interface for testing purposes.

v2:
 - fix grammar in comment
 - use unreachable rather than assert

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2017-08-11 10:43:16 +10:00
Timothy Arceri
e2e2c5abd2 glsl: calculate number of operands in an expression once
Extra validation is added to ir_validate to make sure this is
always updated to the correct numer of operands, as passes like
lower_instructions modify the instructions directly rather then
generating a new one.

The reduction in time is so small that it is not really
measurable. However callgrind was reporting this function as
being called just under 34 million times while compiling the
Deus Ex shaders (just pre-linking was profiled) with 0.20%
spent in this function.

v2:
 - make num_operands a unit8_t
 - fix unsigned/signed mismatches

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2017-08-11 10:43:12 +10:00
Samuel Pitoiset
269c37a676 glsl: update the extensions/functions that are enabled for 460
Other ones are either unsupported or don't have any helper
function checks.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-08-07 21:06:54 +02:00
Jason Ekstrand
95c6a97464 spirv: Fix SpvImageFormatR16ui
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.1 17.2" <mesa-stable@lists.freedesktop.org>
2017-08-02 09:15:01 -07:00
Samuel Pitoiset
1f4ceb8be1 glsl: recognize GLSL 4.60
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-02 13:36:39 +02:00
Juan A. Suarez Romero
c4210dec8a glsl: look up for transform feedback varyings after linking
Check if shaders have transform feedback varyings also after the
post-link step.

This fixes:
KHR-GL45.enhanced_layouts.xfb_vertex_streams
piglit/spec/arb_enhanced_layouts/gs-stream-location-aliasing

v2: add claryfing comments (Timothy)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-08-02 10:04:12 +02:00
Connor Abbott
de91461575 nir: fix algebraic optimizations
The optimizations are only valid for 32-bit integers. They were
mistakenly firing for 64-bit integers as well.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-08-01 12:20:49 -07:00
Juan A. Suarez Romero
54826331b3 glsl: xfb_stride applies to buffers, not block members
When we have an interface block like:

layout (xfb_buffer = 0, xfb_offset = 0) out Block {
                             vec4 var1;
    layout (xfb_stride = 48) vec4 var2;
                             vec4 var3;
};

According to ARB_enhanced_layouts spec:

   "The *xfb_stride* qualifier specifies how many bytes are consumed by
    each captured vertex.  It applies to the transform feedback buffer
    for that declaration, whether it is inherited or explicitly
    declared. It can be applied to variables, blocks, block members, or
    just the qualifier out. [ ...] While *xfb_stride* can be declared
    multiple times for the same buffer, it is a compile-time or
    link-time error to have different values specified for the stride
    for the same buffer."

This means xfb_stride actually applies to the buffer, and not to the
individual components.

In the above example, it means that var2 consumes 16 bytes, and var3 is
at offset 32.

This has been confirmed also by John Kessenich, the main contact for the
ARB_enhanced_layouts specs, and also because this commit fixes:

GL45.enhanced_layouts.xfb_block_member_stride

This commit is in practice a revert of 598790e856 (glsl: apply
xfb_stride to implicit offsets for ifc block members).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-01 15:58:24 +00:00
Nicolai Hähnle
c5f97eab09 st/mesa: get rid of st_glsl_types
It's a duplicate of glsl_type::count_attribute_slots.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:30 +02:00
Nicolai Hähnle
e902ac3268 nir: add nir_lower_uniforms_to_ubo pass
This is a further lowering of default-block uniform loads that transforms
load_uniform intrinsics into load_ubo intrinsics. This simplifies the rest
of the backend.

v2: transform from load_uniform instead of straight from variables

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-07-31 14:55:29 +02:00
Nicolai Hähnle
bce6f99875 nir: add nir_lower_samplers_as_deref pass
This pass is a replacement for the nir_lower_samplers pass, which has the
advantage of keeping sampler references as derefs. This allows a unified
treatment of texture instructions and image intrinsics in the backend.
2017-07-31 14:55:29 +02:00
Nicolai Hähnle
f1da97ef7a nir: add load_frag_coord system value intrinsic
Some drivers prefer to treat gl_FragCoord as a system value rather than
a fragment shader input, see Const.GLSLFragCoordIsSysVal.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-31 14:55:28 +02:00
Nicolai Hähnle
5011923e09 nir: fix nir_lower_wpos_ytransform when gl_FragCoord is a system value
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-31 14:55:28 +02:00
Nicolai Hähnle
b27c2d402e nir: add nir_instr_rewrite_deref
Allows modifying a texture instruction's texture and sampler derefs.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-31 14:55:28 +02:00
Timothy Arceri
6ee3323d7d glsl: small builtin inline tidy up
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-27 22:14:37 +10:00
Timothy Arceri
b0333e55b7 compiler: move glsl_interface_packing enum to shader_enums.h
This allows us to drop the duplicate gl_uniform_block_packing enum.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-26 10:39:52 +10:00
Timothy Arceri
d91108b1f4 glsl: rework misleading block layout code
From the ARB_uniform_buffer_object spec:

   ""shared" uniform blocks, the default layout, ..."

This doesn't fix anything as the default layout is already applied
at this point but fixes the misleading code/comment.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-23 10:06:01 +10:00
Timothy Arceri
316b4c9ada glsl: remove placeholder comment
This was added in 2d03f48a65 and seems like it was intended
as a TODO comment in a function stub rather than a useful
code comment.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-23 10:06:01 +10:00
Chih-Wei Huang
c1a29e104c Android: fix spirv_info.c generation
It's incorrect to use $(LOCAL_PATH) in makefile recipes since it's
changing. The typical way to handle it is to use private variable.
Fortunately in this case we can just simplify them to $^.

See further:
https://patchwork.freedesktop.org/patch/167718/

Also simplify LOCAL_GENERATED_SOURCES.

Fixes: 2dd4e2ec (spirv: Generate spirv_info.c)

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-21 08:48:45 +03:00
Tapani Pälli
b78563f0d0 android: fix libmesa_nir build
current build did not find required include 'spirv_info.h'

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-21 08:47:56 +03:00
Matt Turner
aff108f2fd nir: Optimize find_lsb/imsb/umsb error checks
Two of the ARB_shader_ballot piglit tests hit the find_lsb case,
removing some of the noise allowed me to better debug the test when it
was failing.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2017-07-20 16:56:50 -07:00
Matt Turner
1038d385a9 nir: Reduce destination size of ballot intrinsic when possible
Some hardware, like i965, doesn't support group sizes greater than 32.
In that case, we can reduce the destination size of the ballot
intrinsic, which will simplify our code generation.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner
3e7b8f6cd4 nir: Add pass to scalarize read_invocation/read_first_invocation
i965 will want these to be scalar operations.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner
43ef75b394 nir: Add system values from ARB_shader_ballot
We already had a channel_num system value, which I'm renaming to
subgroup_invocation to match the rest of the new system values.

Note that while ballotARB(true) will return zeros in the high 32-bits on
systems where gl_SubGroupSizeARB <= 32, the gl_SubGroup??MaskARB
variables do not consider whether channels are enabled. See issue (1) of
ARB_shader_ballot.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner
636fe4d1c6 nir: Add intrinsics from ARB_shader_ballot
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner
742cc6118a nir: Support lowering vote intrinsics
... trivially (as allowed by the spec!) by reusing the existing
nir_opt_intrinsics code.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner
d4c9d6a3b2 nir: Add pass to optimize intrinsics
Specifically, constant fold intrinsics from ARB_shader_group_vote, but I
suspect it'll be useful for other things in the future.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner
ba2fbbf1c0 nir: Add intrinsics from ARB_shader_group_vote
These are intrinsics rather than opcodes, because they operate across
channels.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Kenneth Graunke
0320bb2c6c nir: Use nir_src_copy instead of direct assignments.
If the source is an indirect register, there is ralloc'd data.  Copying
with a direct assignment will copy the pointer, but the data will still
belong to the old instruction's memory context.  Since we're lowering
and throwing away instructions, that could free the data by mistake.

Instead, use nir_src_copy, which properly handles this.

This is admittedly not a common case, so I think the bug is real,
but unlikely to be hit.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-18 23:44:50 -07:00
Timothy Arceri
57165f2ef8 glsl: disable array splitting for AoA
While it produces functioning code the pass creates worse code
for arrays of arrays. See the comment added in this patch for more
detail.

V2: skip splitting of AoA of matrices too.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-19 11:06:23 +10:00
Timothy Arceri
3f0fb23b03 nir: fix nir_opt_copy_prop_vars() for arrays of arrays
Previously we only incremented the guide for a single
dimension/wildcard.

V2: rework logic to avoid code duplication

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
2017-07-19 11:06:23 +10:00
Jason Ekstrand
ecf91898e0 nir/vars_to_ssa: Handle missing struct members in foreach_deref_node
This can happen if, for instance, you have an array of structs and there
are both direct and wildcard references to the same struct and some
members only have direct or only have indirect.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Cc: mesa-stable@lists.freedesktop.org
2017-07-19 11:06:23 +10:00
Kenneth Graunke
c2bb39d8d6 build: Add $(top_srcdir)/src/compiler/spirv to AM_CPPFLAGS
Generated C files try to include spirv_info.h.  For in-tree builds,
the header is in the same directory, so it just works.  For out-of-tree
builds, we need to look for it in srcdir rather than builddir.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101831
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-18 11:14:47 -07:00
Jason Ekstrand
2f52d2dc97 compiler/spirv: Add a .gitignore and ignore spirv_info.c 2017-07-18 09:49:13 -07:00
Jason Ekstrand
f2fe74a462 nir/spirv: Add support for SPV_KHR_variable_pointers
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-07-18 09:43:12 -07:00
Jason Ekstrand
182950ceaf nir/spirv: Add a helper for pushing SSA values
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-07-18 09:43:12 -07:00
Jason Ekstrand
868456fbf7 nir/spirv: Implement OpPtrAccessChain for buffers
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-07-18 09:43:12 -07:00
Jason Ekstrand
a968889237 spirv/nir: Add some useful asserts for type decorations
Now that vtn_type has piles of unions, we should assert sanity before
setting fields that may stomp others.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-07-18 09:43:12 -07:00
Jason Ekstrand
999918bd01 spirv: Add support for the StorageBuffer storage class
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-07-18 09:43:12 -07:00
Ian Romanick
2dd4e2ece3 spirv: Generate spirv_info.c
The old table based spirv_*_to_string functions would return NULL for
any values "inside" the table that didn't have entries.  The tables also
needed to be updated by hand each time a new spirv.h was imported.
Generate the file instead.

v2: Make this script work more like src/mesa/main/format_fallback.py.
Suggested by Jason.  Remove SCons supports.  Suggested by Jason and
Emil.  Put all the build work in Makefile.nir.am in lieu of adding a new
Makefile.spirv.am.  Suggested by Emil.  Add support for Android builds
based on code provided by Emil.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-18 09:43:12 -07:00
Ian Romanick
de765ec9dc spirv: Import the lastest 1.0.2 JSON from Khronos
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-18 09:43:12 -07:00
Jason Ekstrand
7141e8105a spirv: Import the latest 1.2 header from Khronos
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-07-18 09:43:12 -07:00