fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-05 05:18:08 +02:00

Author	SHA1	Message	Date
Kenneth Graunke	faaca23734	i965/fs: Make lower_load_payload etc. appear in INTEL_DEBUG=optimizer. In order to support calling lower_load_payload() inside a condition, this patch makes OPT() a statement expression: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html We recently did the equivalent change in the vec4 backend (commit `9b8bd67768`). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-01-16 12:38:26 -08:00
Neil Roberts	a4ab08bf45	format_utils: Use a more precise conversion when decreasing bits When converting to a format that has fewer bits the previous code was just shifting off the bits. This doesn't provide very accurate results. For example when converting from 8 bits to 5 bits it is equivalent to doing this: x * 32 / 256 This works as if it's taking a value from a range where 256 represents 1.0 and scaling it down to a range where 32 represents 1.0. However this is not correct because it is actually 255 and 31 that represent 1.0. We can do better with a formula like this: (x * 31 + 127) / 255 The +127 is to make it round correctly. The new code has a special case to use uint64_t when the result of the multiplication would overflow an unsigned int. This function is inline and only ever called with constant values so hopefully the if statements will be folded. The main incentive to do this is to make the CPU conversion path pick the same values as the hardware would if it did the conversion. This fixes failures with the ‘texsubimage pbo’ test when using the patches from here: http://lists.freedesktop.org/archives/mesa-dev/2015-January/074312.html v2: Use 64-bit arithmetic when src_bits+dst_bits > 32 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-01-16 13:53:15 +00:00
Iago Toral Quiroga	6367ca8b41	i965/gen6: Fix crash with VS+TF after rendering with GS Rendering with a GS and then using transform feedback with a program that does not have a GS can crash in gen6. The reason for this is that brw_begin_transform_feedback checks brw->geometry_program to decide if there is a GS program, but this is not correct: brw->geometry_program is updated when issuing drawing commands, so after rendering with a GS it will be non-NULL until we draw again with a program that does not have a GS. If the next program uses TF, we will call glBegintransformFeedback before issuing the drawing command and hence brw->geometry_program will be non-NULL if the previous rendering used a GS. The right thing to do here is to check ctx->_Shader->CurrentProgram[MESA_SHADER_GEOMETRY] instead. This is what the gen7 code path does too. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=87694 Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2015-01-16 14:16:59 +01:00
Jason Ekstrand	bc6e57e019	nir/live_variables: Use a worklist This is a rework of the liveness algorithm using a worklist as suggested by Connor. Doing so reduces the number of times we walk over the instructions because we don't have to do an entire pointless walk over the instructions just to figure out it's time to stop. Also, the stuff after the last loop in the funciton will only ever get visited once. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 16:54:21 -08:00
Jason Ekstrand	4839d1aed1	nir: Add a worklist helper structure A worklist is a common concept in optimizations. This adds a structure that we can reuse for many different types of optimizations. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 16:54:21 -08:00
Brian Paul	0aaaa13ec9	nir: fix incorrect argument passed to validate_src() in validate_tex_instr() Silences a compiler warning. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 17:41:42 -07:00
Brian Paul	aa479a69d6	nir: silence compiler warning from visit_src() call v2: use proper argument Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 17:09:02 -07:00
Brian Paul	337eca4ac8	mesa: move GET_CURRENT_CONTEXT() to top of _mesa_init_renderbuffer() To fix MSVC build. Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-01-15 16:15:34 -07:00
Mike Mason	e407fb1af4	mesa: Fix render buffer initial internal format in GLES 3 Changes the initial internal format of a render buffer to GL_RGBA4 in GLES 3. This fixes a failure in the following DrawElements test: dEQP-GLES3.functional.state_query.rbo.renderbuffer_internal_format Reviewed-by: Chad Versace <chad.versace@intel.com>	2015-01-15 13:29:48 -08:00
Jason Ekstrand	153b8b3525	util/hash_set: Rework the API to know about hashing Previously, the set API required the user to do all of the hashing of keys as it passed them in. Since the hashing function is intrinsically tied to the comparison function, it makes sense for the hash set to know about it. Also, it makes for a somewhat clumsy API as the user is constantly calling hashing functions many of which have long names. This is especially bad when the standard call looks something like _mesa_set_add(ht, _mesa_pointer_hash(key), key); In the above case, there is no reason why the hash set shouldn't do the hashing for you. We leave the option for you to do your own hashing if it's more efficient, but it's no longer needed. Also, if you do do your own hashing, the hash set will assert that your hash matches what it expects out of the hashing function. This should make it harder to mess up your hashing. This is analygous to `94303a0750` where we did this for hash_table Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-01-15 13:21:27 -08:00
Jason Ekstrand	4c99e3ae78	util: Move main/set to util/hash_set Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-01-15 13:21:27 -08:00
Jason Ekstrand	8ed5305d28	hash_table: Rename insert_with_hash to insert_pre_hashed We already have search_pre_hashed. This makes the APIs match better. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-01-15 13:21:27 -08:00
Matt Turner	f0aec4ee1e	i965: Don't consider null dst instructions as matching non-null dst. When performing common subexpression elimination on instructions with non-null destinations we emit a MOV to copy the result to a new register that must have no other uses. In the case of: cmp.g.f0.0(8) null:D, vgrf43:F, 0.500000f ... cmp.g.f0.0(8) vgrf113:D, vgrf43:F, 0.500000f we put the first instruction in the AEB and decided that we could reuse its result when we found the second. Unfortunately, that meant that we'd emit a MOV from the first's destination, which is null. Don't do anything if the entry's destination is null and the instruction's destination is non-null. Tested-by: Tapani Pälli <tapani.palli@intel.com>	2015-01-15 10:11:42 -08:00
Matt Turner	41d9f232b6	i965/vec4: Make sure that imm writes are to registers in the same file. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87887	2015-01-15 10:11:42 -08:00
Matt Turner	3654b6d43c	i965/fs: Emit MADs from (x + abs(y * z)). Just use the abs source modifier on both of the multiplicand arguments. instructions in affected programs: 300 -> 296 (-1.33%) Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-01-15 10:10:44 -08:00
Matt Turner	c4fab711ed	i965/fs: Emit MADs from (x + -(y * z)). Just use the negation source modifier on one of the multiplicand arguments. total instructions in shared programs: 5889529 -> 5880016 (-0.16%) instructions in affected programs: 600846 -> 591333 (-1.58%) Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-01-15 10:10:44 -08:00
Jason Ekstrand	0d05d1226e	nir/algebraic: Only replace an instruction once Without the break, it was possible that an instruction would match multiple expressions. If this happened, you could end up trying to replace it multiple times and get a segfault. This makes it so that, after a successful replacement, it moves on to the next instruction. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:24 -08:00
Jason Ekstrand	c56adc68e2	i965/nir: Do a final copy lowering pass before lowering locals to regs Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:24 -08:00
Jason Ekstrand	0f85310975	nir/vars_to_ssa: Use the copy lowering from lower_var_copies Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:24 -08:00
Jason Ekstrand	d3636da902	nir: Add a pass for lowering copy instructions Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:24 -08:00
Jason Ekstrand	700ba5daaf	nir/vars_to_ssa: Refactor get_deref_node This refactor allows you to more easily get the deref node associated with a given variable. We then use that new functionality in the deref_may_be_aliased function instead of creating a 1-element deref chain. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:24 -08:00
Jason Ekstrand	55b5058e69	nir: Rename lower_variables to lower_vars_to_ssa The original name wasn't particularly descriptive. This one indicates that it actually gives you SSA values as opposed to the old pass which lowered variables to registers. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:24 -08:00
Jason Ekstrand	4aa6162f6e	nir/tex_instr: Add a nir_tex_src struct and dynamically allocate the src array This solves a number of problems. First is the ability to change the number of sources that a texture instruction has. Second, it solves the delema that may occur if a texture instruction has more than 4 sources. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:24 -08:00
Jason Ekstrand	dcb1acdea0	nir/validate: Only build in debug mode Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:24 -08:00
Jason Ekstrand	347ab2bf24	nir/lower_variables: Improve documentation Additional description was added to a variety of places. Also, we no longer use the term "leaf" to describe fully-qualified direct derefs. Instead, we simply use the term "direct" or spell it out completely. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	8016fa39e1	nir/lower_variables: Use a for loop for get_deref_node Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	0c0ca8b6ae	nir: Use the actual FNV-1a hash for hashing derefs We also switch to using loops rather than recursion. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	a3b73ccf6d	util/hash_table: Pull the details of the FNV-1a into helpers This way the basics of the FNV-1a hash can be reused to easily create other hashing functions. Reviewed-by: Eric Anholt <eric@anholt.net>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	e4115ca9d8	nir: Make intrinsic flags into an enum This should be much better for debugging as GDB will pick up on the fact that it's an enum and actually tell you what you're looking at instead of giving you some arbitrary hex value you have to go look up. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	ed13f4e716	nir: Use static inlines instead of macros for list getters This should make debugging a lot easier as GDB handles static inlines much better than macros. Also, static inlines are typesafe. Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	b95fae034f	nir/variable: Remove the constant_value field This was a left-over relic of GLSL IR that we aren't using for anything. If we ever want that value again, we can add it back, but NIR constant folding should be just as good as GLSL IR's if not better pretty soon, so I'm not worried about it. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	8599b30c67	nir: Add some documentation Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	ad9d0a9ea6	nir/lower_variables: Follow the Cytron paper more closely Previously, our variable renaming algorithm, while similar to the one in the Cytron paper, was not the same. While I'm pretty sure it was correct, it will be easier for readers of the code in the variable renaming pass if it follows more closely. This commit removes the automatic stack popping we were doing and replaces it with explicit popping like Cytron does. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	b1d114a48c	nir/print: Various cleanups recommended by Eric Cc: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	e2763339fe	nir/lower_variables: Add a bunch of comments and re-arrange a few things This commit seeks to make the lower_variables pass much more clear by adding a pile of comments and re-arranging a few things. There are no functional or algorithmic changes. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	40ca129ed5	nir: Rename parallel_copy_copy to parallel_copy_entry and add a foreach macro parallel_copy_copy was a silly name. Also, things were getting long and annoying, so I added a foreach macro. For historical reasons, several of the original iterations over parallel copy entries in from_ssa used the _safe variants of the loop. However, all of these no longer ever remove an entry so it's ok to make them all use the normal iterator. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	1b720c6ed8	nir/from_ssa: Clean up parallel copy handling and document it better Previously, we were doing a lazy creation of the parallel copy instructions. This is confusing, hard to get right, and involves some extra state tracking of the copies. This commit adds an extra walk over the basic blocks to add the block-end parallel copies up front. This should be much less confusing and, consequently, easier to get right. This commit also adds more comments about parallel copies to help explain what all is going on. As a consequence of these changes, we can now remove the at_end parameter from nir_parallel_copy_instr. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	de73d1e173	nir: Rename nir_block_following_if to nir_block_get_following_if The new name is a little longer but less confusing. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	cb53aacaa1	i965/fs_nir: Handle sample ID, position, and mask better Before, we were emitting the full pile of setup instructions for sample_id and sample_pos every time they were used. With this commit, we emit them in their own pass once at the beginning of the shader and simply emit uses later on. When it comes time for setting up VS, we can put setup for its special values in the same pass. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	813316d150	nir/opcodes: Remove the per_component info field Originally, this field was intended for determining if the given instruction acted per-component or if it had mismatching source and destination sizes that would have to be interpreted specially. However, we can easily derive this from output_size == 0, so it's not really that useful. Also, the values we were setting in nir_opcodes.h for this field were completely bogus and it was never used. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	e2a8f9e5cc	nir/search: Use nir_op_infos to determine if an operation is commutative Prior to this commit, we had a big switch statement for this. Now it's baked into the opcode metadata so we can just use that. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	46f3e1ab50	nir/opcodes: Add algebraic properties metadata This commit adds some algebraic properties to the metadata of each opcode in NIR. In particular, you now know, just from the metadata, if a given opcode is commutative or associative. This will be useful for algebraic transformation passes that want to be able to match a + b as well as b + a in one go. v2: Make algebraic properties all caps. This was more consistent with the intrinsics flags and seems better for flags in general. Also, the enums are now declared with (1 << n) rather then hex values. v3: fmin and fmax technically aren't commutative or associative. Things get funny when one of the arguments is a NaN. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	2c7da78805	nir: Make load_const SSA-only As it was, we weren't ever using load_const in a non-SSA way. This allows us to substantially simplify the load_const instruction. If we ever need a non-SSA constant load, we can do a load_const and an imov. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	675ffdef30	nir: Make nir_ssa_undef_instr_create initialize the destination Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	951a7f23a0	i965/nir: Move the other lowering passes to before out-of-SSA Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	5c16be1c52	nir/lower_system_values: Handle SSA destinations Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	821e75a160	nir/lower_atomics: Use/support SSA Previously, lower_atomics was non-SSA only. We assert-failed if the destination of an atomic operation intrinsic was an SSA def and we used temporary registers for computing offsets. This commit changes both of these behaviors. We now use SSA values for computing offsets (so we can optimize them) and we handle SSA destinations. We also move the pass to run before we go out of SSA on i965 as it now generates SSA values. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	8ddb03d56d	nir/live_variables: Use the new ssa_def iterator Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	28a3e164e2	nir: Use nir_foreach_ssa_def for setting up ssa destinations Before, we were using foreach_dest and switching on whether the destination was an SSA value. This works, except not all destinations are SSA values so we have to special-case ssa_undef instructions. Now that we have a foreach_ssa_def function, we can iterate over all of the register destinations in one pass and iterate over the SSA destinations in a second. This way, if we add other ssa-only instructions, we won't have to worry about adding them to the special case we have for ssa_undef. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	193fea9eb6	nir: Add a foreach_ssa_def function There are some functions whose destinations are SSA-only and so aren't a nir_dest. This provides a function that is capable of iterating over the SSA definitions defined by those functions. If you want registers, you should use the old iterator. v2: Kenneth Graunke <kenneth@whitecape.org>: - Fix nir_foreach_ssa_def's return value. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00

1 2 3 4 5 ...

67569 commits