fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-27 14:28:22 +02:00

Author	SHA1	Message	Date
Matt Turner	69ad5fd4ce	glsl: Optimize (f2i(trunc x)) into (f2i x). total instructions in shared programs: 5950326 -> 5949286 (-0.02%) instructions in affected programs: 88264 -> 87224 (-1.18%) helped: 692	2015-02-11 13:50:19 -08:00
Matt Turner	c262b2b582	glsl: Optimize round-half-up pattern. Hurts some Psychonauts shaders, but after the next patch (which this enables) they're fewer instructions than before this patch.	2015-02-11 13:50:19 -08:00
Matt Turner	a5455ab1ca	glsl: Add trunc() to ir_builder.	2015-02-11 13:50:19 -08:00
Matt Turner	4c42e1116b	nir: Recognize open-coded fmin/fmax. And unfortunately other shaders do the same thing but with >=/<= which we can't apply this optimization to because of NaNs. instructions in affected programs: 23309 -> 22938 (-1.59%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-02-11 13:50:19 -08:00
Eric Anholt	56e21647e2	nir: Add algebraic opt for int comparisons with identical operands. No change on shader-db on i965. v2: Reword the comment due to feedback from Erik Faye-Lund Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v1) Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> (v1)	2015-02-11 11:52:38 -08:00
Eric Anholt	2919bdf466	nir: Fix load_const comparisons for CSE. We want the size of a float per component, not the size of a whole vec4. NIR instructions on i965: total instructions in shared programs: 1261937 -> 1261929 (-0.00%) instructions in affected programs: 114 -> 106 (-7.02%) Looking at one of these examples (tesseract), it's from vec4 load_consts for a MRT solid fill, which do get CSEed now that we don't memcmp off the end of the const value and into the SSA def. For the 1-component loads that are common in i965, we were only memcmping off into the rest of the usually zero-filled const_value. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-11 11:52:38 -08:00
Matt Turner	ea0f0eb6c0	glsl: Optimize 1/exp(x) into exp(-x). Lots of shaders divide by exp2(...) which we turn into a multiplication by the reciprocal. We can avoid the reciprocal by simply negating exp2's argument. total instructions in shared programs: 5947154 -> 5946695 (-0.01%) instructions in affected programs: 118661 -> 118202 (-0.39%) helped: 380 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-02-10 17:48:44 -08:00
Matt Turner	a9065cef48	nir: Remove casts from void*. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-10 17:48:42 -08:00
Matt Turner	bb1e007157	nir: Replace assert(0) with unreachable(). Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-10 17:48:31 -08:00
Matt Turner	942b56ad05	nir: Remove unused has_indirect variable. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-02-10 17:48:16 -08:00
Francisco Jerez	e6146e6f14	glsl: Forbid calling the constructor of any opaque type. The spec doesn't define any opaque type constructors. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-02-10 15:49:43 +02:00
Francisco Jerez	c4111dfa0a	glsl: Return correct number of coordinate components for cubemap array images. Cubemap array images are unlike cubemap array samplers in that they don't need an additional coordinate to index individual cubemaps in the array, instead they behave like a 2D array of 6n layers, with n the number of cubemaps in the array. Take this exception into account. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-02-10 15:49:43 +02:00
Kenneth Graunke	480ee1f0b4	nir: Mark nir_print_instr's instr pointer as const. Printing instructions doesn't modify them, so we can mark the parameter const. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-02-10 03:37:55 -08:00
Eric Anholt	bff4cbdafa	nir: Fix broken fsat recognizer. We've probably never seen this ridiculous pattern in the wild, so it didn't matter. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-02-06 15:57:55 -08:00
Eric Anholt	6706537dd4	nir: Slightly simplify algebraic code generation by reusing a struct. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-02-06 15:57:55 -08:00
Iago Toral Quiroga	71a36e0a2c	glsl: GLSL ES identifiers cannot exceed 1024 characters v2 (Ian Romanick) - Move the check to the lexer before rallocing a copy of the large string. Fixes the following 2 dEQP tests: dEQP-GLES3.functional.shaders.keywords.invalid_identifiers.max_length_vertex dEQP-GLES3.functional.shaders.keywords.invalid_identifiers.max_length_fragment Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-02-06 12:21:42 +01:00
Connor Abbott	a135f34080	nir: add an optimization to remove useless phi nodes This removes phi nodes whose sources all point to the same thing. Shader-db results: total NIR instructions in shared programs: 2045293 -> 2041209 (-0.20%) NIR instructions in affected programs: 126564 -> 122480 (-3.23%) helped: 615 HURT: 0 total FS instructions in shared programs: 4321840 -> 4320392 (-0.03%) FS instructions in affected programs: 24622 -> 23174 (-5.88%) helped: 138 HURT: 0 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Tested-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-03 16:00:13 -05:00
Jason Ekstrand	572d1f6e41	nir/validate: Ensure that phi sources are SSA-only Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-03 12:52:42 -08:00
Jason Ekstrand	5420774510	nir/validate: Validate that only float ALU outputs are saturated Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-03 12:46:55 -08:00
Jason Ekstrand	c0df85cca4	nir/lower_source_mods: Don't lower saturate for non-float outputs Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-03 12:46:38 -08:00
Jason Ekstrand	f2adcd36cb	nir: Add a pass to lower vector phi nodes to scalar phi nodes v2 Jason Ekstrand <jason.ekstrand@intel.com>: - Add better comments - Use nir_ssa_dest_init and nir_src_for_ssa more places - Fix some void * casts v3 Jason Ekstrand <jason.ekstrand@intel.com>: - Rework the way we determine whether or not to sccalarize a phi node to make the recursion non-bogus - Treat load_const instructions as scalarizable v4 Jason Ekstrand <jason.ekstrand@intel.com>: - Allow uniform and input loads to be scalarizable v5 Jason Ekstrand <jason.ekstrand@intel.com>: - Also consider loads of inputs (varying, uniform, or ubo) to be scalarizable. We were already doing this for load_var on uniforms and inputs. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-02-03 12:33:11 -08:00
Matt Turner	d8be1b9aba	glsl/list: Note that exec_lists may not be realloc'd. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-02-03 12:25:14 -08:00
Iago Toral Quiroga	5dfb085ff3	glsl: Improve precision of mod(x,y) Currently, Mesa uses the lowering pass MOD_TO_FRACT to implement mod(x,y) as y * fract(x/y). This implementation has a down side though: it introduces precision errors due to the fract() operation. Even worse, since the result of fract() is multiplied by y, the larger y gets the larger the precision error we produce, so for large enough numbers the precision loss is significant. Some examples on i965: Operation Precision error ----------------------------------------------------- mod(-1.951171875, 1.9980468750) 0.0000000447 mod(121.57, 13.29) 0.0000023842 mod(3769.12, 321.99) 0.0000762939 mod(3769.12, 1321.99) 0.0001220703 mod(-987654.125, 123456.984375) 0.0160663128 mod( 987654.125, 123456.984375) 0.0312500000 This patch replaces the current lowering pass with a different one (MOD_TO_FLOOR) that follows the recommended implementation in the GLSL man pages: mod(x,y) = x - y * floor(x/y) This implementation eliminates the precision errors at the expense of an additional add instruction on some systems. On systems that can do negate with multiply-add in a single operation this new implementation would come at no additional cost. v2 (Ian Romanick) - Do not clone operands because when they are expressions we would be duplicating them and that can lead to suboptimal code. Fixes the following 16 dEQP tests: dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.mediump_* dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.highp_* Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-02-03 13:19:36 +01:00
Iago Toral Quiroga	ec7dcaf578	glsl: can't have 'const' qualifier used with struct or interface block members Fixes the following 2 dEQP tests: dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_const_vertex dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_const_fragment Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-02-03 13:19:36 +01:00
Iago Toral Quiroga	5d655a43e6	glsl: interface blocks must be declared at global scope Fixes the following 2 dEQP tests: dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_in_main_vertex dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_in_main_fragment Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-02-03 13:19:36 +01:00
Kenneth Graunke	0f06f12c11	glsl: Pick ast_conditional branch regardless of op1/2 being constant. If the ?: operator's condition is a constant value, and both branches were pure expressions, we can just make the resulting value one or the other. Previously, we only did this if op[1] and op[2] were also constant values - but there's no actual reason for that restriction. No changes in shader-db, probably because we usually optimize this later anyway. But it does make us generate less stupid code up front. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-02-02 17:14:55 -08:00
Jason Ekstrand	604ae33c8b	nir/opt_algebraic: Add some constant bcsel reductions total instructions in shared programs: 5998190 -> 5997603 (-0.01%) instructions in affected programs: 54276 -> 53689 (-1.08%) helped: 293 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-01-29 17:11:13 -08:00
Jason Ekstrand	7f19cd5a56	nir/opt_algebraic: Add some boolean simplifications total instructions in shared programs: 5998321 -> 5998287 (-0.00%) instructions in affected programs: 4520 -> 4486 (-0.75%) helped: 8 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-01-29 17:11:10 -08:00
Jason Ekstrand	70273c5cd5	nir/algebraic: Support specifying variable as constant or by type Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-01-29 17:07:45 -08:00
Jason Ekstrand	81f77e4f3a	nir/algebraic: Fail to compile of a variable is used in a replace but not the search Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-01-29 17:07:45 -08:00
Jason Ekstrand	026b5cc792	nir/search: Allow for matching variables based on types This allows you to match on an unknown value but only if it is of a given type. 90% of the uses of this are for matching only booleans, but adding the generality of arbitrary types is no more complex. nir_algebraic.py doesn't handle this yet but that's ok because the C language will ensure that the default type on all variables is void. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-01-29 17:07:45 -08:00
Jason Ekstrand	d8999bcdce	nir/search: Add support for matching unknown constants There are some algebraic transformations that we want to do but only if certain things are constants. For instance, we may want to replace a * (b + c) with (a * b) + (a * c) as long as a and either b or c is constant. While this generates more instructions, some of it will get constant folded. nir_algebraic.py doesn't handle this yet, but that's ok because the C language will make sure that false is the default for now. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-01-29 17:07:45 -08:00
Jason Ekstrand	5ab1489ae6	nir: Add an invalid type This allows us to indicate a concept of an invalid type. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-01-29 17:07:45 -08:00
Eric Anholt	fc884eadf1	nir: Add variants of some of the comparison simplifications. We end up with these from TGSI-to-NIR because the pass generating the comparisons doesn't know if the arg is actually a bool input or not. vc4 results: total instructions in shared programs: 41801 -> 41508 (-0.70%) instructions in affected programs: 4253 -> 3960 (-6.89%) Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-01-29 11:44:06 -08:00
Eric Anholt	9a3a60cb13	nir: Don't try to to-SSA ALU instructions that are already SSA. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-01-29 11:43:33 -08:00
Eric Anholt	68d476167c	nir: Fix a bit of broken indentation. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-01-29 11:42:08 -08:00
Eric Anholt	36c604c824	nir: Add a couple of helpers for glsl types. This will be used by tgsi_to_nir, which needs to get vec4 types for declaring shader input/output variables. v2: Add a missing space. Reviewed-by: Matt Turner <mattst88@gmail.com> (v2) Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-01-29 11:41:17 -08:00
Eric Anholt	dd4d9a4e62	nir: Make vec-to-movs handle src/dest aliasing. It now emits vector MOVs instead of a series of individual MOVs, which should be useful to any vector backends. This pushes the problem of src/dest aliasing of channels on a scalar chip to the backend, but if there are any vector operations in your shader then you needed to be handling this already. Fixes fs-swap-problem with my scalarizing patches. v2: Rename to insert_mov(), and add a comment about what it does. v3: Rewrite the comment. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v3)	2015-01-28 16:33:34 -08:00
Jason Ekstrand	bb26ebac13	nir/opcodes: Use a return type of tfloat for ldexp Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-01-28 13:21:40 -08:00
Jason Ekstrand	f0340ff625	Revert "nir/opcodes: Use fpclassify() instead of isnormal() for ldexp" This reverts commit `d7d340fb2f`. We have an isnormal() implementation available, the only problem was that we had the wrong return type (fixed in a later patch). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806 Acked-by: Matt Turner <mattst88@gmail.com>	2015-01-28 13:19:47 -08:00
Jason Ekstrand	d7d340fb2f	nir/opcodes: Use fpclassify() instead of isnormal() for ldexp Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-01-28 03:42:41 -08:00
Connor Abbott	f1a9252def	nir: fix a bug with constant folding non-per-component instructions Before, we were only copying the first N channels, where N is the size of the SSA destination, which is fine for per-component instructions, but non-per-component instructions like fdot3 can have more source components than destination components. Fix this using the helper function introduced in the last patch. v2: use new helper name Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-26 21:26:36 -05:00
Connor Abbott	816f0515a2	nir: add a helper function for getting the number of source components Unlike with non-SSA ALU instructions, where if they're per-component you have to look at the writemask to know which source channels are being used, SSA ALU instructions always have all the possible channels enabled so we can just look at the number of components in the SSA definition for per-component instructions to say how many source components are being used. v2: use new name nir_ssa_alu_instr_src_components() Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-26 21:26:36 -05:00
Jason Ekstrand	dd74369a0a	nir/opcodes: Don't go through doubles when constant-folding iabs Previously, we called the abs() function in math.h. However, this involves unnecessarily going through double. This commit changes it to use integers directly with a ternary. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-01-26 11:25:02 -08:00
Jason Ekstrand	9bd28fe3a3	nir/opcodes: Simplify and fix the unpack_half__split_ constant expressions Previously, these functions were explicitly writing to dst.x and dst.y. However they both return only one component so writing to dst.y is invalid. Also, since they only return one component, we don't need the explicit assignment in the expression and can simplify it use an implicit assignment. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-26 11:25:02 -08:00
Jason Ekstrand	27c6e3e4ca	nir: Use pointers for nir_src_copy and nir_dest_copy This avoids the overhead of copying structures and better matches the newly added nir_alu_src_copy and nir_alu_dest_copy. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-26 11:24:58 -08:00
Connor Abbott	0aa31bf9c3	nir/constant_folding: use the new constant folding infrastructure Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-01-24 21:35:35 -08:00
Jason Ekstrand	89285e4d47	nir: add new constant folding infrastructure Add a required field to the Opcode class, const_expr, that contains an expression or statement that computes the result of the opcode given known constant inputs. Then take those const_expr's and expand them into a function that takes an opcode and an array of constant inputs and spits out the constant result. This means that when adding opcodes, there's one less place to update, and almost all the opcodes are self-documenting since the information on how to compute the result is right next to the definition. The helper functions in nir_constant_expressions.c were taken from ir_constant_expressions.cpp. v3 Jason Ekstrand <jason.ekstrand@iastate.edu> - Use mako to generate one function per opcode instead of doing piles of string splicing v4 Jason Ekstrand <jason.ekstrand@iastate.edu> - More comments and better indentation in the mako - Add a description of the constant expression language in nir_opcodes.py - Added nir_constant_expressions.py to EXTRA_DIST in Makefile.am Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-24 21:35:35 -08:00
Connor Abbott	fa4bc6c130	nir: use Python to autogenerate opcode information Before, we used a system where a file, nir_opcodes.h, defined some macros that were included to generate the enum values and the nir_op_infos structure. This worked pretty well, but for development the error messages were never very useful, Python tools couldn't understand the opcode list, and it was difficult to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h, which contains all the enum names and gets included into nir.h like before. In addition to solving the above problems, using Python and Mako to generate everything means that it's much easier to add keep information centralized as we add new things like constant propagation that require per-opcode information. v2: - make Opcode derive from object (Dylan) - don't use assert like it's a function (Dylan) - style fixes for fnoise, use xrange (Dylan) - use iterkeys() in nir_opcodes_h.py (Dylan) - use pydoc-style comments (Jason) - don't make fmin/fmax commutative and associative yet (Jason) Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> v3 Jason Ekstrand <jason.ekstrand@intel.com> - Alphabetize source file lists - Generate nir_opcodes.h in the builddir instead of the source dir - Include $(builddir)/src/glsl/nir in the i965 build - Rework nir_opcodes.h generation so it generates a complete header file instead of one that has to be embedded inside an enum declaration	2015-01-24 21:33:56 -08:00
Matt Turner	579157e6c1	glsl: Add a foreach_in_list_reverse_safe macro. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-01-23 17:57:39 -08:00

1 2 3 4 5 ...

3254 commits