fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-16 14:08:07 +02:00

Author	SHA1	Message	Date
Jordan Justen	7de056e1a9	scons: Generate float64_glsl.h for glsl_to_nir fp64 lowering Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-02 14:33:44 -08:00
Jordan Justen	31b35916dd	nir: Add int64/doubles options into nir_shader_compiler_options This will allow the options to be visible under nir_shader->options, which will allow the gallium state_tracker to use the driver preferred settings during glsl_to_nir. Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-02 14:33:41 -08:00
Ian Romanick	bae0c36751	nir/algebraic: Optimize away an fsat of a b2f The b2f can only produce 0.0 or 1.0, so the fsat does nothing. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-02 13:58:56 -08:00
Ian Romanick	ecc9ffa778	nir/algebraic: Replace a-fract(a) with floor(a) I noticed this while looking at a shader that was affected by Tim's "more loop unrolling" series. In review, Tim Arceri asked: > Why the hurt on Gen6+ is this something that should be in the late > optimisations pass? As far as I can tell, it's just because our scheduler is terrible. In all the fragment shaders that I looked at (some hurt shaders were from other stages), only one of the SIMD8 or SIMD16 version would be hurt. In many of those case, the other SIMD width is improved (e.g., shaders/closed/steam/brutal-legend/3990.shader_test). Often it looks like the scheduler decides to differently schedule a SEND the occurs somewhere early in the shader. Once that happens, everything is different. I looked at one vertex shader that was hurt (from Goat Simulator). In that case, both the floor and fract are used. The optimization eliminates the add, and it should allow better scheduling. In the area of the FRC and RNDD instructions, the scheduler does the right thing. However, later in the shader a MAD and and ADD get scheduled differently, and that makes it slightly worse. In light of this, I tried adding some "is_used_once" mark-up, and that did not fix all the cycles regressions. It also did a lot more harm than good on SKL (helped 82 vs. hurt 241). All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15437001 -> 15435259 (-0.01%) instructions in affected programs: 213651 -> 211909 (-0.82%) helped: 988 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 1.76 x̃: 1 helped stats (rel) min: 0.15% max: 11.54% x̄: 1.14% x̃: 0.59% 95% mean confidence interval for instructions value: -1.89 -1.63 95% mean confidence interval for instructions %-change: -1.23% -1.05% Instructions are helped. total cycles in shared programs: 383007378 -> 382997063 (<.01%) cycles in affected programs: 1650825 -> 1640510 (-0.62%) helped: 679 HURT: 302 helped stats (abs) min: 1 max: 348 x̄: 23.39 x̃: 14 helped stats (rel) min: 0.04% max: 28.77% x̄: 1.61% x̃: 0.98% HURT stats (abs) min: 1 max: 250 x̄: 18.43 x̃: 7 HURT stats (rel) min: 0.04% max: 25.86% x̄: 1.41% x̃: 0.53% 95% mean confidence interval for cycles value: -13.05 -7.98 95% mean confidence interval for cycles %-change: -0.86% -0.50% Cycles are helped. Iron Lake and GM45 had similar results. (GM45 shown) total instructions in shared programs: 5043616 -> 5043010 (-0.01%) instructions in affected programs: 119691 -> 119085 (-0.51%) helped: 432 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 1.40 x̃: 1 helped stats (rel) min: 0.10% max: 8.11% x̄: 0.66% x̃: 0.39% 95% mean confidence interval for instructions value: -1.58 -1.23 95% mean confidence interval for instructions %-change: -0.72% -0.59% Instructions are helped. total cycles in shared programs: 128139812 -> 128135762 (<.01%) cycles in affected programs: 3829724 -> 3825674 (-0.11%) helped: 602 HURT: 0 helped stats (abs) min: 2 max: 486 x̄: 6.73 x̃: 6 helped stats (rel) min: 0.02% max: 4.85% x̄: 0.19% x̃: 0.10% 95% mean confidence interval for cycles value: -8.40 -5.05 95% mean confidence interval for cycles %-change: -0.22% -0.16% Cycles are helped. Reviewed-by: Elie Tournier <tournier.elie@gmail.com>	2019-03-01 12:43:25 -08:00
Ian Romanick	d40640efe8	nir/algebraic: Replace a bcsel of a b2f sources with a b2f(!(a \|\| b)) I have not investigated the result of doing this during code generation. That should be possible, but it would be a bit more effort. All Gen6+ platforms had nearly identical results. (Skylake shown) total cycles in shared programs: 370961508 -> 370961367 (<.01%) cycles in affected programs: 5174 -> 5033 (-2.73%) helped: 2 HURT: 0 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8206587 -> 8206589 (<.01%) instructions in affected programs: 1325 -> 1327 (0.15%) helped: 0 HURT: 2 total cycles in shared programs: 187657422 -> 187657428 (<.01%) cycles in affected programs: 11566 -> 11572 (0.05%) helped: 0 HURT: 2 This change has almost no effect right now. However, removing this patch (but leaving the patch "intel/fs: Generate if instructions with inverted conditions") after adding a patch that removes !(a < b) -> (a >= b) optimizations (like https://patchwork.freedesktop.org/patch/264787/) has the following results on Skylake: Skylake total instructions in shared programs: 15071804 -> 15071806 (<.01%) instructions in affected programs: 640 -> 642 (0.31%) helped: 0 HURT: 2 total cycles in shared programs: 369914348 -> 369916569 (<.01%) cycles in affected programs: 27900 -> 30121 (7.96%) helped: 4 HURT: 15 helped stats (abs) min: 2 max: 112 x̄: 30.00 x̃: 3 helped stats (rel) min: 0.28% max: 12.28% x̄: 3.34% x̃: 0.40% HURT stats (abs) min: 2 max: 758 x̄: 156.07 x̃: 81 HURT stats (rel) min: 0.20% max: 74.30% x̄: 16.29% x̃: 16.91% 95% mean confidence interval for cycles value: 12.68 221.11 95% mean confidence interval for cycles %-change: 3.09% 21.23% Cycles are HURT. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	eae19f5f19	nir/algebraic: Replace i2b used by bcsel or if-statement with comparison All of the helped shaders are in Deus Ex. I looked at a couple shaders, and they have a pattern like: vec1 32 ssa_373 = i2b32 ssa_345.w vec1 32 ssa_374 = bcsel ssa_373, ssa_20, ssa_0 ... vec1 32 ssa_377 = ine ssa_345.w, ssa_0 if ssa_377 { ... vec1 32 ssa_416 = i2b32 ssa_385.w vec1 32 ssa_417 = bcsel ssa_416, ssa_386, ssa_374 ... } The massive help occurs because the i2b32 is removed, then other passes determine that ssa_374 must be ssa_20 inside the if-statement allowing the first bcsel to also be deleted. v2: Rebase on 1-bit Boolean changes. v3: Fix i2b32 vs ine problem in if-statement replacement. Noticed by Bas. Skylake total instructions in shared programs: 15241394 -> 15186287 (-0.36%) instructions in affected programs: 890583 -> 835476 (-6.19%) helped: 355 HURT: 0 helped stats (abs) min: 1 max: 497 x̄: 155.23 x̃: 149 helped stats (rel) min: 0.09% max: 16.49% x̄: 6.10% x̃: 6.59% 95% mean confidence interval for instructions value: -165.07 -145.39 95% mean confidence interval for instructions %-change: -6.42% -5.77% Instructions are helped. total cycles in shared programs: 373846583 -> 371023357 (-0.76%) cycles in affected programs: 118972102 -> 116148876 (-2.37%) helped: 343 HURT: 14 helped stats (abs) min: 45 max: 118284 x̄: 8332.32 x̃: 6089 helped stats (rel) min: 0.03% max: 38.19% x̄: 2.48% x̃: 1.77% HURT stats (abs) min: 120 max: 4126 x̄: 2482.79 x̃: 3019 HURT stats (rel) min: 0.16% max: 17.37% x̄: 2.13% x̃: 1.11% 95% mean confidence interval for cycles value: -8723.28 -7093.12 95% mean confidence interval for cycles %-change: -2.57% -2.02% Cycles are helped. total spills in shared programs: 32401 -> 23465 (-27.58%) spills in affected programs: 24457 -> 15521 (-36.54%) helped: 343 HURT: 0 total fills in shared programs: 37866 -> 31765 (-16.11%) fills in affected programs: 18889 -> 12788 (-32.30%) helped: 343 HURT: 0 Broadwell and Haswell had similar results. (Haswell shown) Haswell total instructions in shared programs: 13764783 -> 13750679 (-0.10%) instructions in affected programs: 1176256 -> 1162152 (-1.20%) helped: 334 HURT: 21 helped stats (abs) min: 1 max: 358 x̄: 42.59 x̃: 47 helped stats (rel) min: 0.09% max: 11.81% x̄: 1.30% x̃: 1.37% HURT stats (abs) min: 1 max: 61 x̄: 5.76 x̃: 1 HURT stats (rel) min: 0.03% max: 1.84% x̄: 0.17% x̃: 0.03% 95% mean confidence interval for instructions value: -43.99 -35.47 95% mean confidence interval for instructions %-change: -1.35% -1.08% Instructions are helped. total cycles in shared programs: 386511910 -> 385402528 (-0.29%) cycles in affected programs: 143831110 -> 142721728 (-0.77%) helped: 327 HURT: 39 helped stats (abs) min: 16 max: 25219 x̄: 3519.74 x̃: 3570 helped stats (rel) min: <.01% max: 10.26% x̄: 0.95% x̃: 0.96% HURT stats (abs) min: 16 max: 4881 x̄: 1065.95 x̃: 997 HURT stats (rel) min: <.01% max: 16.67% x̄: 0.70% x̃: 0.24% 95% mean confidence interval for cycles value: -3375.59 -2686.60 95% mean confidence interval for cycles %-change: -0.92% -0.64% Cycles are helped. total spills in shared programs: 100480 -> 97846 (-2.62%) spills in affected programs: 84702 -> 82068 (-3.11%) helped: 316 HURT: 21 total fills in shared programs: 96877 -> 94369 (-2.59%) fills in affected programs: 69167 -> 66659 (-3.63%) helped: 316 HURT: 9 No changes on Ivy Bridge or earlier platforms. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Caio Marcelo de Oliveira Filho	1458aa1f78	nir/copy_prop_vars: handle indirect vector elements Differently than the direct case, the indirect array derefs of vector are handled like regular derefs, with the exception that we ignore any vector entry that has SSA values when performing a load. Such SSA values don't help loading of the indirect unless we emit an if-ladder. Copy_derefs are supported for indirects. Also enable two tests that now pass. v2: Remove unnecessary temporaries. Be clearer when identifying the case where copy_entry doesn't help when we are dealing with an indirect array_deref (of a vector). (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho	6c0de78cc2	nir/copy_prop_vars: prefer using entries from equal derefs When looking up an entry to use, always prefer an equal match, as it more likely to contain reusable SSA or derefs to propagate. This will be necessary when adding entries with array derefs of vectors, because we don't want the vector if the equal entry (an array deref of that vector) is present. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho	61965afd00	nir/copy_prop_vars: add tests for indirect array deref Both on an actual array and on a vector, and an extra test on a vector mixing direct and indirect access. The vector tests are disabled and will be enabled by a later commit. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho	96c32d7776	nir/copy_prop_vars: handle load/store of vector elements When direct array deref is used on a vector type (for loads and stores), copy_prop_vars is now smart to propagate values it knows about. Given a 'vec4 v', storing to v[3] will update the copy entry for v and it is equivalent to a write to v.w. Loading from v[1] will try first to see if there's a known value for v.y -- and drop the load in that case. The copy entries still always refer to the entire vectors, so the operations happen on the parent deref (the 'vector') and the values are fixed accordingly. It might be the case now that certain entries have not only different SSA defs in each element but also those come from different components than they are set to, because stores to individual elements always come from a SSA definition with a single component. Tests related to these cases are now enabled. v2: Instead of asserting on invalid indices, "load" an undef and remove the store. (Jason) v3: Merge code path for the cases of is_array_deref_of_vector into the regular code path. Add a base_index parameter to value_set_from_value. (code changes by Jason) v4: Removed the get_entry_for_deref helper, now being used only once. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:50:05 -08:00
Caio Marcelo de Oliveira Filho	33dafdc024	nir/copy_prop_vars: use NIR_MAX_VEC_COMPONENTS Also replace uses of 0xf with the appropriate full mask created from the number of components. Note that an increase of MAX might make us change how the data is stored later on, but for now at least we make sure the pass is not hardcoded. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:50:05 -08:00
Caio Marcelo de Oliveira Filho	e84c841fb0	nir/copy_prop_vars: rename/refactor store_to_entry helper The name reflected this function role back when the pass also did dead write elimination. So rename it to what it does now, which is setting a value using another value; and narrow the argument list. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:50:05 -08:00
Juan A. Suarez Romero	b43b55d461	nir/spirv: return after emitting a branch in block When emitting a branch in a block, it does not make sense to continue processing further instructions, as they will not be reachable. This fixes a nasty case with a loop with a branch that both then-part and else-part exits the loop: %1 = OpLabel OpLoopMerge %2 %3 None OpBranchConditional %false %2 %2 %3 = OpLabel OpBranch %1 %2 = OpLabel [...] We know that block %1 will branch always to block %2, which is the merge block for the loop. And thus a break is emitted. If we keep continuing processing further instructions, we will be processing the branch conditional and thus emitting the proper NIR conditional, which leads to instructions after the break. This fixes dEQP-VK.graphicsfuzz.continue-and-merge. CC: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 09:47:06 +01:00
Timothy Arceri	7536af670b	glsl: fix shader cache for packed param list Some types of params such as some builtins are always padded. We need to keep track of this so we can restore the list correctly. Here we also remove a couple of cache entries that are not actually required as they get rebuilt by the _mesa_add_parameter() calls. This patch fixes a bunch of arb_texture_multisample and arb_sample_shading piglit tests for the radeonsi NIR backend. Fixes: `edded12376` ("mesa: rework ParameterList to allow packing") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-02-28 11:47:37 +11:00
Gert Wollny	b7201a468d	nir: Add posibility to not lower to source mod 'abs' for ops with three sources This is useful for r600 since there the abs source modifier is not supported for ops with three sources v2: Use correct logic to enable lowering to abs source mod (Eric Anhold) Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-27 11:04:06 +00:00
Kasireddy, Vivek	78fb3fd17e	nir/lower_tex: Add support for XYUV lowering The memory layout associated with this format would be: Byte: 0 1 2 3 Component: V U Y X Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-26 13:08:51 +00:00
Tapani Pälli	22267feff1	nir: initialize value in copy_prop_vars_block Fixes following valgrind warning: ==27561== Conditional jump or move depends on uninitialised value(s) ==27561== at 0x667856B: value_set_ssa_components (nir_opt_copy_prop_vars.c:78) ==27561== by 0x667A1C4: copy_prop_vars_block (nir_opt_copy_prop_vars.c:797) Fixes: `62332d139c` "nir: Add a local variable-based copy propagation pass" Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-26 08:56:25 +02:00
Eric Anholt	7c1bf075f3	nir: Just return when asked to rewrite uses of an SSA def to itself. The nir_builder swizzling improvement to not emit extra MOVs resulted in nir_lower_tex() trying to rewrite an SSA def to itself, triggering the assert on all texturing in v3d. There's no work to be done in this case, so just stop asserting. Fixes: `743700be1f` ("nir/builder: Don't emit no-op swizzles") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-25 21:25:24 +00:00
Daniel Schürmann	0bd45f96b9	nir: Use SM5 properties to optimize shift(a@32, iand(31, b)) This is a common pattern from HLSL->SPIRV translation and supported in HW by all current NIR backends. vkpipeline-db results anv (SKL): total instructions in shared programs: `6403130` -> 6402380 (-0.01%) instructions in affected programs: 204084 -> 203334 (-0.37%) helped: 208 HURT: 0 total cycles in shared programs: 1915629582 -> 1918198408 (0.13%) cycles in affected programs: 1158892682 -> 1161461508 (0.22%) helped: 107 HURT: 86 shader-db results on i965 (KBL): total instructions in shared programs: 15284592 -> 15284568 (<.01%) instructions in affected programs: 81683 -> 81659 (-0.03%) helped: 24 HURT: 0 total cycles in shared programs: 375013622 -> 375013932 (<.01%) cycles in affected programs: 40169618 -> 40169928 (<.01%) helped: 13 HURT: 9 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-25 12:59:44 -06:00
Daniel Schürmann	0525bdc225	nir: Define shifts according to SM5 specification. SPIR-V shifts are undefined for values >= bitsize, but SM5 shifts are defined to only use the least significant bits. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-25 12:59:43 -06:00
Oscar Blumberg	da9c030763	glsl: Fix function return typechecking apply_implicit_conversion only converts and check base types but we need actual type equality for function returns, otherwise you can return a vec2 from a function declared as returning a float. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-02-25 08:49:06 +02:00
Jason Ekstrand	743700be1f	nir/builder: Don't emit no-op swizzles The nir_swizzle helper is used some on it's own but it's also called by nir_channel and nir_channels which are used everywhere. It's pretty quick to check while we're walking the swizzle anyway whether or not it's an identity swizzle. If it is, we now don't bother emitting the instruction. Sure, copy-prop will clean it up for us but there's no sense making more work for the optimizer than we have to. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-24 20:01:27 -06:00
Jason Ekstrand	724371c6b9	nir/split_vars: Don't compact vectors unnecessarily Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-02-24 20:01:18 -06:00
Caio Marcelo de Oliveira Filho	4c160b6bd8	nir: fix MSVC build Zero initialize struct with {0} instead of {}.	2019-02-22 22:38:05 -08:00
Caio Marcelo de Oliveira Filho	eb13211997	nir/copy_prop_vars: add tests for load/store elements of vectors Test using array deref on vectors in loads and stores. These are marked DISABLED_ as this optimization is currently not done. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho	4f3809d389	nir: nir_build_deref_follower accept array derefs of vectors Code itself already supports it, just make sure we can use it for those cases. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho	c4beadd28e	nir/copy_prop_vars: change test helper to get intrinsics Replace find_next_intrinsic(intrinsic, after) with get_intrinsic(intrinsic, index). This makes slightly more convenient to check the resulting loads/stores/copies, since in most tests we know which one we care about. The cost is to perform more traversals, but for such tests this is not a problem. Added the ASSERT_EQ() on count to some tests missing it, so the indices queried are always expected to find something. Also, drop two nir_print_shader leftover calls in a test. v2: Remove redundant assertions. nir_src_comp_as_uint already assert what we need. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho	fdcb9779d9	nir/copy_prop_vars: keep track of components in copy_entry When a copy_entry is SSA, store not only the nir_ssa_def* for each component, but also the source component they come from. At the moment this is always a match (i.e. 'component[i] == i'), because all the operations for a copy_entry happen using definitions with the same size. This prepares the code for array_derefs of vectors, in which 'component[i] != i'. Also, extract setting all SSA components into a function of its own. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho	6624decbb5	nir/copy_prop_vars: add debug helpers Disabled by default, to be used during development. Adding those so I don't rewrite some ad-hoc version of them everytime I'm working with this pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho	60d9bb9ff5	nir/copy_prop_vars: don't get confused by array_deref of vectors For now these derefs are not handled, so don't let these get into the copies list -- which would cause wrong propagations. For load_derefs, do nothing. For store_derefs, invalidate whatever the store is writing to. For copy_derefs, invalidate whatever the copy is writing to. These cases will happen once derefs to SSBOs/UBOs are kept around long enough to get optimized by copy_prop_vars. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 21:00:50 -08:00
Timothy Arceri	f48527e51a	nir: allow nir_lower_phis_to_scalar() on more src types Rather than only lowering if all srcs are scalarizable we instead check that at least one src is scalarizable. We change undef type to return false otherwise it will cause regressions when it is the only scalarizable src. total instructions in shared programs: 13219105 -> 13024547 (-1.47%) instructions in affected programs: 1153797 -> 959239 (-16.86%) helped: 581 HURT: 74 total cycles in shared programs: 333968972 -> 324807922 (-2.74%) cycles in affected programs: 129809402 -> 120648352 (-7.06%) helped: 571 HURT: 131 total spills in shared programs: 57947 -> 29130 (-49.73%) spills in affected programs: 53364 -> 24547 (-54.00%) helped: 351 HURT: 0 total fills in shared programs: 51310 -> 25468 (-50.36%) fills in affected programs: 44882 -> 19040 (-57.58%) helped: 351 HURT: 0 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-23 11:11:51 +11:00
Timothy Arceri	d9e08e753b	nir: clone instruction set rather than removing individual entries This reduces the time spent in nir_opt_cse() by almost a half. The massif tool from callgrind reported no change in peak memory use with the large doliphin uber shaders I used for testing. Reviewed-by: Thomas Helland<thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 08:36:36 +11:00
Jason Ekstrand	f98fd9d15a	nir/lower_clip_cull: Fix an incorrect assert Copy+paste error. It was supposed to test cull and not clip. Fixes: `4e69fba534` "nir: Rewrite lower_clip_cull_distance_arrays..." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109717 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-02-21 12:05:12 -06:00
Jason Ekstrand	f9b2f10a41	nir: Fix a compile warning	2019-02-21 09:44:42 -06:00
Alejandro Piñeiro	0629b2a462	nir, glsl: move pixel_center_integer/origin_upper_left to shader_info.fs On GLSL that info is set as a layout qualifier when redeclaring gl_FragCoord, so somehow tied to a specific variable. But in practice, they behave as a global of the shader. On ARB programs they are set using a global OPTION (defined at ARB_fragment_coord_conventions), and on SPIR-V using ExecutionModes, that are also not tied specifically to the builtin. This patch moves that info from nir variable and ir variable to nir shader and gl_program shader_info respectively, so the map is more similar to SPIR-V, and ARB programs, instead of more similar to GLSL. FWIW, shader_info.fs already had pixel_center_integer, so this change also removes some redundancy. Also, as struct gl_program also includes a shader_info, we removed gl_program::OriginUpperLeft and PixelCenterInteger, as it would be superfluous. This change was needed because recently spirv_to_nir changed the order in which execution modes and variables are handled, so the variables didn't get the correct values. Now the info is set on the shader itself, and we don't need to go back to the builtin variable to set it. Fixes: `e68871f6a` ("spirv: Handle constants and types before execution modes") v2: (Jason) * glsl_to_nir: get the info before glsl_to_nir, while all the rest of the info gathering is happening * prog_to_nir: gather the info on a general info-gathering pass, not on variable setup. v3: (Jason) * Squash with the patch that removes that info from ir variable * anv: assert that OriginUpperLeft is true. It should be already set by spirv_to_nir. * blorp: set origin_upper_left on its core "compile fragment shader", not just on some specific places (for this we added an helper on a previous patch). * prog_to_nir: no need to gather specifically this fragcoord modes as the full gl_program shader_info is copied. * spirv_to_nir: assert that we are a fragment shader when handling this execution modes. v4: (reported by failing gitlab pipeline #18750) * state_tracker: update too due changes on ir.h/gl_program v5: * blorp: minor change after change on previous patch * radeonsi: update due this change. v6: (Timothy Arceri) * prog_to_nir: remove extra whitespace * shader_info: don't use :1 on origin_upper_left * glsl: program.fs.origin_upper_left/pixel_center_integer can be move out of the shader list loop	2019-02-21 11:47:59 +01:00
Jason Ekstrand	1a93fc382b	nir/xfb: Handle compact arrays in gather_xfb_info This makes us properly handle gl_ClipDistance and gl_CullDistance. Fixes: `19064b8c` "nir: Add a pass for gathering transform feedback info" Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-02-21 00:08:42 +00:00
Jason Ekstrand	558c314504	nir/xfb: Work in terms of components rather than slots We needed to better handle cases where a chunk of a variable starts at some non-zero location_frac and rolls over into the next slot but may not be more than 4 dwords. For example, if gl_CullDistance is an array of 3 things and has location_frac = 2, it will span across two vec4s but is not, itself, bigger than a vec4. If you ignore the clip/cull special case, it's not allowed to happen for anything else because the only things that can span more than one slot is dvec3 and dvec4 and they're both bigger than a vec4. The current code uses this attrib_slot thing where we count attribute slots and iterate over them. However, that doesn't work in the case above because gl_CullDistance will have an attrib_slot count of 1 even though it does span two slots. We could fix this by adjusting attrib_slot but we already have comp_mask and it's easier to just handle it that way. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-02-21 00:08:42 +00:00
Jason Ekstrand	4e69fba534	nir: Rewrite lower_clip_cull_distance_arrays to do a lot less lowering Instead of going to all the work of to combine them into one array, just make two arrays and use location_frac to colocate them within CLIP0. Then the back-end can sort things out and stack them on top of each other. Thanks to `ef99f4c8`, we also don't need to set compact anymore. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-21 00:08:42 +00:00
Jason Ekstrand	8f0fe71cc5	nir/xfb: Properly align 64-bit values Fixes: `19064b8c` "nir: Add a pass for gathering transform feedback info" Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-02-21 00:08:42 +00:00
Jason Ekstrand	30b548fc62	compiler/types: Add a contains_64bit helper Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-02-21 00:08:42 +00:00
Timothy Arceri	03783253b1	nir: remove non-ssa support from nir_copy_prop() Even in a very basic shader this reduces the time spent in nir_copy_prop() by ~17%. No shader-db changes for radeonsi NIR or i965. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-21 10:18:24 +11:00
Kenneth Graunke	d6337b59f6	nir: Don't forget if-uses in new nir_opt_dead_cf liveness check Commit `08bfd710a2`. (nir/dead_cf: Stop relying on liveness analysis) introduced a new check that iterated through a SSA def's uses, to see if it's used. But it only checked normal uses, and not uses which are part of an 'if' condition. This led to it thinking more nodes were dead than possible. Fixes Piglit's variable-indexing/tcs-output-array-float-index-wr test (and related tests) with the out-of-tree Iris driver. Fixes: `08bfd710a2` nir/dead_cf: Stop relying on liveness analysis Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-20 09:44:06 -08:00
Kenneth Graunke	3c2c6bd1c7	compiler: Make is_64bit(GL_*) helper more broadly available I'd like to use this in the prog_parameter.c code, so I need to move it into C, make it non-static, and so on. This probably isn't the ideal place for it, but I couldn't think of a better one. Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-19 13:26:58 -08:00
Kenneth Graunke	535251487b	nir: Don't reassociate add/mul chains containing only constants The idea here is to reassociate a * (b * c) into (a * c) * b, when b is a non-constant value, but a and c are constants, allowing them to be combined. But nothing was enforcing that 'b' must be non-constant, which meant that running opt_algebraic in a loop would never terminate if the IR contained non-folded constant expressions like 256 * 0.5 * 2. Normally, we call constant folding in such a loop too, but IMO it's better for nir_opt_algebraic to be robust and not rely on that. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109581 Fixes: `32e266a9a5` i965: Compile fp64 funcs only if we do not have 64-bit hardware support Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-16 23:36:14 -08:00
Timothy Arceri	a801196ec9	nir: remove simple dead if detection from nir_opt_dead_cf() This was probably useful when it was first written, however it looks to be no longer necessary. As far as I can tell these days dce is smart enough to remove useless instructions from if branches. Once this is done nir_opt_peephole_select() will end up removing the empty if. Removing this support reduces the dolphin uber shader compilation time spent in nir_opt_dead_cf() by a little over 7x. No shader-db changes on i965 or radeonsi. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-02-16 10:45:31 +11:00
Ian Romanick	979b43b347	nir/algebraic: Simplify comparison with sequential integers starting with 0 All of the affected shaders are Unreal4 demos. All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15437170 -> 15437001 (<.01%) instructions in affected programs: 21536 -> 21367 (-0.78%) helped: 43 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 3.93 x̃: 4 helped stats (rel) min: 0.68% max: 1.01% x̄: 0.80% x̃: 0.80% 95% mean confidence interval for instructions value: -4.07 -3.79 95% mean confidence interval for instructions %-change: -0.83% -0.77% Instructions are helped. total cycles in shared programs: 383007896 -> 383007378 (<.01%) cycles in affected programs: 158640 -> 158122 (-0.33%) helped: 38 HURT: 4 helped stats (abs) min: 1 max: 48 x̄: 13.89 x̃: 6 helped stats (rel) min: 0.03% max: 1.01% x̄: 0.33% x̃: 0.19% HURT stats (abs) min: 2 max: 3 x̄: 2.50 x̃: 2 HURT stats (rel) min: 0.06% max: 0.09% x̄: 0.08% x̃: 0.08% 95% mean confidence interval for cycles value: -16.90 -7.77 95% mean confidence interval for cycles %-change: -0.39% -0.19% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8213746 -> 8213745 (<.01%) instructions in affected programs: 127 -> 126 (-0.79%) helped: 1 HURT: 0 total cycles in shared programs: 187734146 -> 187734144 (<.01%) cycles in affected programs: 2132 -> 2130 (-0.09%) helped: 1 HURT: 0 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-15 11:11:02 -08:00
Ian Romanick	ad05920258	nir/algebraic: Convert some f2u to f2i Section 5.4.1 (Conversion and Scalar Constructors) of the GLSL 4.60 spec says: It is undefined to convert a negative floating-point value to an uint. Assuming that (uint)some_float behaves like (uint)(int)some_float allows some optimizations in the i965 backend to proceed. This basically undoes the small amount of damage done by "intel/compiler: Avoid propagating inequality cmods if types are different". v2: Replicate part of the commit message as a comment in the code. Suggested by Jason. shader-db results compairing before "intel/compiler: Avoid propagating inequality cmods if types are different" and after this commit: Skylake total cycles in shared programs: 383007996 -> 383007896 (<.01%) cycles in affected programs: 85208 -> 85108 (-0.12%) helped: 13 HURT: 8 helped stats (abs) min: 2 max: 26 x̄: 10.77 x̃: 6 helped stats (rel) min: 0.09% max: 0.65% x̄: 0.28% x̃: 0.14% HURT stats (abs) min: 2 max: 12 x̄: 5.00 x̃: 3 HURT stats (rel) min: 0.04% max: 0.32% x̄: 0.12% x̃: 0.07% 95% mean confidence interval for cycles value: -9.31 -0.21 95% mean confidence interval for cycles %-change: -0.24% <.01% Cycles are helped. Broadwell total cycles in shared programs: 415251194 -> 415251370 (<.01%) cycles in affected programs: 83750 -> 83926 (0.21%) helped: 7 HURT: 13 helped stats (abs) min: 10 max: 12 x̄: 11.43 x̃: 12 helped stats (rel) min: 0.30% max: 0.30% x̄: 0.30% x̃: 0.30% HURT stats (abs) min: 2 max: 36 x̄: 19.69 x̃: 22 HURT stats (rel) min: 0.05% max: 0.89% x̄: 0.44% x̃: 0.47% 95% mean confidence interval for cycles value: 0.76 16.84 95% mean confidence interval for cycles %-change: <.01% 0.37% Inconclusive result (%-change mean confidence interval includes 0). Haswell total instructions in shared programs: 13823885 -> 13823886 (<.01%) instructions in affected programs: 2249 -> 2250 (0.04%) helped: 0 HURT: 1 total cycles in shared programs: 390094243 -> 390094001 (<.01%) cycles in affected programs: 85640 -> 85398 (-0.28%) helped: 15 HURT: 6 helped stats (abs) min: 4 max: 26 x̄: 18.53 x̃: 18 helped stats (rel) min: 0.09% max: 0.66% x̄: 0.47% x̃: 0.42% HURT stats (abs) min: 2 max: 14 x̄: 6.00 x̃: 2 HURT stats (rel) min: 0.04% max: 0.37% x̄: 0.15% x̃: 0.04% 95% mean confidence interval for cycles value: -17.36 -5.69 95% mean confidence interval for cycles %-change: -0.44% -0.14% Cycles are helped. Ivy Bridge total cycles in shared programs: 180986448 -> 180986552 (<.01%) cycles in affected programs: 34835 -> 34939 (0.30%) helped: 0 HURT: 10 HURT stats (abs) min: 2 max: 18 x̄: 10.40 x̃: 10 HURT stats (rel) min: 0.06% max: 0.36% x̄: 0.28% x̃: 0.30% 95% mean confidence interval for cycles value: 4.67 16.13 95% mean confidence interval for cycles %-change: 0.20% 0.35% Cycles are HURT. Sandy Bridge total cycles in shared programs: 154603969 -> 154603970 (<.01%) cycles in affected programs: 171514 -> 171515 (<.01%) helped: 25 HURT: 14 helped stats (abs) min: 1 max: 4 x̄: 1.80 x̃: 1 helped stats (rel) min: 0.02% max: 0.10% x̄: 0.04% x̃: 0.04% HURT stats (abs) min: 1 max: 8 x̄: 3.29 x̃: 3 HURT stats (rel) min: 0.03% max: 0.28% x̄: 0.10% x̃: 0.11% 95% mean confidence interval for cycles value: -0.91 0.96 95% mean confidence interval for cycles %-change: -0.02% 0.04% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-15 11:11:02 -08:00
Juan A. Suarez Romero	1fb24080b7	nir: remove jump from two merging jump-ending blocks In opt_peel_initial_if optimization, when moving the continue list to end of the continue block, before the jump, could happen that the continue list itself also ends with a jump. This would mean that we would have two jump instructions in a row: the first one from the continue list and the second one from the contine block. As inserting an instruction after a jump is not allowed (and it does not make sense, as it will not be executed), remove the jump from the continue block and keep the one from continue list, as it will be executed first. CC: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-15 15:16:24 +01:00
Juan A. Suarez Romero	69be9934a7	nir: move ALU instruction before the jump instruction opt_split_alu_of_phi moves ALU instruction to the end of continue block. But if the continue block ends with a jump instruction (an explicit "continue" instruction) then the ALU must be inserted before the jump, as it is illegal to add instructions after the jump. CC: Ian Romanick <ian.d.romanick@intel.com> Fixes: `0881e90c09` ("nir: Split ALU instructions in loops that read phis") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-15 15:14:36 +01:00
Jason Ekstrand	08bfd710a2	nir/dead_cf: Stop relying on liveness analysis The liveness analysis pass is fairly expensive because it has to build large bit-sets and run a fix-point algorithm on them. Instead of requiring liveness for detecting if values escape a CF node, just take advantage of the structured nature of NIR and use block indices instead. This only requires the block index metadata which is the fastest we have metadata to generate. No shader-db changes on Kaby Lake Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-14 23:06:29 -06:00

1 2 3 4 5 ...

3361 commits