fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 02:58:06 +02:00

Author	SHA1	Message	Date
Rhys Perry	3e67aa2e4e	nir/load_store_vectorize: fix combining stores with aliasing loads between v2: add test Fixes: `ce9205c03b` ('nir: add a load/store vectorization pass') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v1) Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v2)	2019-12-04 12:21:40 +00:00
Ian Romanick	fbd5359a0a	nir/algebraic: Rearrange bcsel sequences generated by nir_opt_peephole_select Reviewed-by: Matt Turner <mattst88@gmail.com> All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14660366 -> 14653437 (-0.05%) instructions in affected programs: 316166 -> 309237 (-2.19%) helped: 905 HURT: 10 helped stats (abs) min: 1 max: 36 x̄: 7.67 x̃: 6 helped stats (rel) min: 0.13% max: 18.75% x̄: 4.28% x̃: 3.60% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.10% max: 1.33% x̄: 0.70% x̃: 0.97% 95% mean confidence interval for instructions value: -7.91 -7.23 95% mean confidence interval for instructions %-change: -4.46% -3.99% Instructions are helped. total cycles in shared programs: 228571646 -> 228549759 (<.01%) cycles in affected programs: 56239919 -> 56218032 (-0.04%) helped: 681 HURT: 216 helped stats (abs) min: 1 max: 5156 x̄: 45.49 x̃: 10 helped stats (rel) min: <.01% max: 10.45% x̄: 1.29% x̃: 0.65% HURT stats (abs) min: 1 max: 320 x̄: 42.09 x̃: 14 HURT stats (rel) min: <.01% max: 37.04% x̄: 1.38% x̃: 0.49% 95% mean confidence interval for cycles value: -41.51 -7.29 95% mean confidence interval for cycles %-change: -0.80% -0.49% Cycles are helped. LOST: 1 GAINED: 0	2019-12-02 16:46:20 -08:00
Ian Romanick	780b5c1037	nir/algebraic: Simplify some Inf and NaN avoidance code Since a is non-negative, neither fsqrt nor frsq should return NaN. frsq should only return Inf when fsqrt returns 0. The changes are pretty small, but this turns a few hundred hurt shaders in the next patch into helped shaders. An alternative to the intBitsToFloat is to import numpy and do np.finfo(np.float32).max. That's more explicit, but we may also want to have specific bit encodings of float values later. I could be convinced either way, but intBitsToFloat(0x7f7fffff) was what I implemented first. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Matt Turner <mattst88@gmail.com> All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14661140 -> 14661104 (<.01%) instructions in affected programs: 7520 -> 7484 (-0.48%) helped: 36 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.32% max: 0.61% x̄: 0.49% x̃: 0.52% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.52% -0.47% Instructions are helped. total cycles in shared programs: 228585416 -> 228584806 (<.01%) cycles in affected programs: 56321 -> 55711 (-1.08%) helped: 32 HURT: 0 helped stats (abs) min: 2 max: 98 x̄: 19.06 x̃: 10 helped stats (rel) min: 0.08% max: 6.41% x̄: 1.09% x̃: 0.65% 95% mean confidence interval for cycles value: -28.32 -9.80 95% mean confidence interval for cycles %-change: -1.63% -0.54% Cycles are helped. Sandy Bridge total cycles in shared programs: 152991077 -> 152991075 (<.01%) cycles in affected programs: 11525 -> 11523 (-0.02%) helped: 2 HURT: 2 helped stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.07% max: 0.11% x̄: 0.09% x̃: 0.09% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08% 95% mean confidence interval for cycles value: -5.27 4.27 95% mean confidence interval for cycles %-change: -0.16% 0.15% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45.	2019-12-02 16:46:20 -08:00
Ian Romanick	e342d6970b	nir/opt_peephole_select: Don't count some unary operations In many cases, fsat, fneg, fabs, ineg, and iabs will get folded into another instruction as either source or destination modifiers. Counting them as instructions means that some if-statements won't get converted to selects. For example, vec1 32 ssa_25 = flt32 ssa_0, ssa_23.x /* succs: block_1 block_2 / if ssa_25 { block block_1: / preds: block_0 / vec1 32 ssa_26 = fabs ssa_24 vec1 32 ssa_27 = fneg ssa_26 vec1 32 ssa_28 = fabs ssa_20 vec1 32 ssa_29 = fneg ssa_28 vec1 32 ssa_30 = fmul ssa_27, ssa_29 vec1 32 ssa_31 = fsat ssa_30 / succs: block_3 / } else { block block_2: / preds: block_0 / / succs: block_3 / } block block_3: / preds: block_1 block_2 */ block_1 isn't really 6 instructions, but it will be counted that way. Most callers of the peephole_select pass use either 1 or 8. It's very easy to blow way past either of these limits with things that are really only one or two actual instructions. I also tried some fancier things like making sure the fsat was of another SSA def from the same block, but the simple test was actually better. The i965 back-end SEL peephole pass still helps ~700 shaders in shader-db with this change. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Matt Turner <mattst88@gmail.com> All Gen6+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14743694 -> 14738910 (-0.03%) instructions in affected programs: 156575 -> 151791 (-3.06%) helped: 1204 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 3.97 x̃: 3 helped stats (rel) min: 0.15% max: 19.57% x̄: 5.15% x̃: 4.55% 95% mean confidence interval for instructions value: -4.12 -3.82 95% mean confidence interval for instructions %-change: -5.35% -4.95% Instructions are helped. total cycles in shared programs: 231749141 -> 231602916 (-0.06%) cycles in affected programs: 2818975 -> 2672750 (-5.19%) helped: 876 HURT: 322 helped stats (abs) min: 2 max: 788 x̄: 180.99 x̃: 220 helped stats (rel) min: <.01% max: 43.82% x̄: 20.75% x̃: 19.44% HURT stats (abs) min: 1 max: 1188 x̄: 38.27 x̃: 20 HURT stats (rel) min: 0.09% max: 102.67% x̄: 5.17% x̃: 1.70% 95% mean confidence interval for cycles value: -130.47 -113.64 95% mean confidence interval for cycles %-change: -14.85% -12.72% Cycles are helped. total sends in shared programs: 730495 -> 730491 (<.01%) sends in affected programs: 46 -> 42 (-8.70%) helped: 2 HURT: 0 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8122757 -> 8122617 (<.01%) instructions in affected programs: 14716 -> 14576 (-0.95%) helped: 46 HURT: 1 helped stats (abs) min: 1 max: 8 x̄: 3.07 x̃: 3 helped stats (rel) min: 0.36% max: 10.00% x̄: 2.54% x̃: 1.06% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.59% max: 1.59% x̄: 1.59% x̃: 1.59% 95% mean confidence interval for instructions value: -3.42 -2.54 95% mean confidence interval for instructions %-change: -3.28% -1.62% Instructions are helped. total cycles in shared programs: 188510100 -> 188509780 (<.01%) cycles in affected programs: 58994 -> 58674 (-0.54%) helped: 32 HURT: 1 helped stats (abs) min: 2 max: 96 x̄: 10.06 x̃: 6 helped stats (rel) min: 0.05% max: 15.29% x̄: 1.37% x̃: 0.31% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.68% max: 0.68% x̄: 0.68% x̃: 0.68% 95% mean confidence interval for cycles value: -16.34 -3.06 95% mean confidence interval for cycles %-change: -2.46% -0.15% Cycles are helped.	2019-12-02 16:46:19 -08:00
Rhys Perry	5404b7aaa3	nir/lower_io_to_vector: don't create arrays when not needed Some backends require that there are no array varyings. If there were no arrays in the input shader, the pass shouldn't have to create new ones. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2103 Fixes: `bcd14756ee` ('nir/lower_io_to_vector: add flat mode') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-12-02 17:45:01 +00:00
Eric Anholt	d845dca0f5	nir: Make algebraic backtrack and reprocess after a replacement. The algebraic pass was exhibiting O(n^2) behavior in dEQP-GLES2.functional.uniform_api.random.3 and dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 (along with other code-generated tests, and likely real-world loop-unroll cases). In the process of using fmul(b2f(x), b2f(x)) -> b2f(iand(x, y)) to transform: result = b2f(a == b); result = b2f(c == d); ... result = b2f(z == w); -> temp = (a == b) temp = temp && (c == d) ... temp = temp && (z == w) result = b2f(temp); nir_opt_algebraic, proceeding bottom-to-top, would match and convert the top-most fmul(b2f(), b2f()) case each time, leaving the new b2f to be matched by the next fmul down on the next time algebraic got run by the optimization loop. Back in 2016 in `7be8d07732` ("nir: Do opt_algebraic in reverse order."), Matt changed algebraic to go bottom-to-top so that we would match the biggest patterns first. This helped his cases, but I believe introduced this failure mode. Instead of reverting that, now that we've got the automaton, we can update the automaton's state recursively and just re-process any instructions whose state has changed (indicating that they might match new things). There's a small chance that the state will hash to the same value and miss out on this round of algebraic, but this seems to be good enough to fix dEQP. Effects with NIR_VALIDATE=0 (improvement is better with validation enabled): Intel shader-db runtime -0.954712% +/- 0.333844% (n=44/46, obvious throttling outliers removed) dEQP-GLES2.functional.uniform_api.random.3 runtime -65.3512% +/- 4.22369% (n=21, was 1.4s) dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 runtime -68.8066% +/- 6.49523% (was 4.8s) v2: Use two worklists, suggested by @cwabbott, to cut out a bunch of tricky code. Runtime of uniform_api.random.3 down -0.790299% +/- 0.244213% compred to v1. v3: Re-add the nir_instr_remove() that I accidentally dropped in v2, fixing infinite loops. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-26 10:13:46 -08:00
Eric Anholt	90ad6304bf	nir: Refactor algebraic's block walk My motivation was to clarify the changes in the following commit, but incidentally, it reduces runtime of dEQP-GLES2.functional.uniform_api.random.3 (an algebraic-heavy testcase) by -5.39524% +/- 2.21179% (n=15) Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-26 10:13:40 -08:00
Connor Abbott	305d1300f9	nir: Maintain the algebraic automaton's state as we work. In order to have nir_opt_algebraic be able to do further algebraic work on the output of a replacement, we need to maintain the automaton's state. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-11-26 10:13:19 -08:00
Eric Anholt	8afab607ac	nir: Add a scheduler pass to reduce maximum register pressure. This is similar to a scheduler I've written for vc4 and i965, but this time written at the NIR level so that hopefully it's reusable. A notable new feature it has is Goodman/Hsu's heuristic of "once we've started processing the uses of a value, prioritize processing the rest of their uses", which should help avoid the heuristic otherwise making such systematically bad choices around getting texture results consumed. Results for v3d: total instructions in shared programs: 6497588 -> 6518242 (0.32%) total threads in shared programs: 154000 -> 152828 (-0.76%) total uniforms in shared programs: 2119629 -> 2068681 (-2.40%) total spills in shared programs: 4984 -> 472 (-90.53%) total fills in shared programs: 6418 -> 1546 (-75.91%) Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> (v1) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v2) v2: Use the DAG datastructure, fold in the scheduling-for-parallelism patch, include SSA defs in live values so we can switch to bottom-up if we want. v3: Squash in improvements from Alejandro Piñeiro for getting V3D to successfully register allocate on GLES3.1 dEQP. Make sure that discards don't move after store_output. Comment spelling fix.	2019-11-25 21:12:21 +00:00
Rhys Perry	0a759c3be6	nir: add load/store vectorizer tests v7: run nir_opt_algebraic v9: rework the callback function v9: update alignment on all loads/stores, even if they're not vectorized v10: add tests for 64-bit offsets v10: add tests for signed offsets Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)	2019-11-25 13:59:11 +00:00
Rhys Perry	ce9205c03b	nir: add a load/store vectorization pass This pass combines intersecting, adjacent and identical loads/stores into potentially larger ones and will be used by ACO to greatly reduce the number of memory operations. v2: handle nir_deref_type_ptr_as_array v3: assume explicitly laid out types for derefs v4: create less deref casts v4: fix shared boolean vectorization v4: fix copy+paste error in resources_different v4: fix extract_subvector() to pass nir_load_store_vectorize_test.ssbo_load_intersecting_32_32_64 v4: rebase v5: subtract from deref/offset instead of scheduling offset calculations v5: various non-functional changes/cleanups v5: require less metadata and preserve more v5: rebase v6: cleanup and improve dependency handling v6: emit less deref casts v6: pass undef to components not set in the write_mask for new stores v7: fix 8-bit extract_vector() with 64-bit input v7: cleanup creation of store write data v7: update align correctly for when the bit size of load/store increases v7: rename extract_vector to extract_component and update comment v8: prevent combining of row-major matrix column acceses v9: rework process_block() to be able to vectorize more v9: rework the callback function v9: update alignment on all loads/stores, even if they're not vectorized v9: remove entry::store_value, since it will not be updated if it's was from a vectorized load v9: fix bug in subtract_deref(), causing artifacts in Dishonored 2 v9: handle nir_intrinsic_scoped_memory_barrier v10: use nir_ssa_scalar v10: handle non-32-bit offsets v10: use signed offsets for comparison v10: improve create_entry_key_from_offset() v10: support load_shared/store_shared v10: remove strip_deref_casts() v10: don't ever pass NULL to memcmp v10: remove recursion in gcd() v10: fix outdated comment v11: use the new nir_extract_bits() v12: remove use of nir_src_as_const_value in resources_different v13: make entry key hash function deterministic v13: simplify mask_sign_extend() v14: add comment in hash_entry_key() about hashing pointers Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)	2019-11-25 13:59:11 +00:00
Rhys Perry	c14f823ee5	nir: add nir_num_variable_modes and nir_var_mem_push_const These will be useful in the upcoming load/store vectorizer. v11: rebase Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-11-25 13:59:11 +00:00
Brian Paul	a2689ebcd6	nir: no-op C99 _Pragma() with MSVC This fixes a build failure on MSVC. BTW, it looks like clang supports _Pragma() but I don't know if it understands the "gcc unroll N" directive. Signed-off-by: Brian Paul <brianp@vmware.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-11-23 10:34:24 -07:00
Marek Olšák	ad40715f35	nir/serialize: support any num_components for remaining instructions Only NPOT vectors greater than vec4 use the extra uint32. This is for instructions that share the dest code. load_const and undef already support 1-16 in the header. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	c028449c01	nir/serialize: use 3 unused bits in intrinsic for packed_const_indices Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	3d44aed09e	nir/serialize: don't serialize redundant nir_intrinsic_instr::num_components Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	a2df670b14	nir/serialize: serialize writemask for vec8 and vec16 Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	a5c5388234	nir/serialize: serialize swizzles for vec8 and vec16 Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	f1a48d54ea	nir/serialize: reuse the writemask field for 2 src X swizzles of SSA ALU Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	487a495cc0	nir/serialize: remove up to 3 consecutive equal ALU instruction headers vec4 scalarized ALUs typically have 4 equal instruction headers, so remove the last 3. There are no bits left in the ALU header for more flags, so future extensions of NIR will have to use something like instr_type == 15 to describe more complex ALU instructions. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	c3fa9de2a9	nir/serialize: try to pack both deref array src into 32 bits Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	ed6b01d5e0	nir/serialize: cleanup - fold nir_deref_type_var cases into switches Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	a0cd67d292	nir/serialize: try to put deref->var index into the unused bits of the header Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	ca201bfe70	nir/serialize: don't serialize mode for deref non-cast instructions It can be derived from src and var. This frees 10 bits in the header that will be used later. "mode" is moved in the structure, because those bits will be used for something else later. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	2286340fde	nir/serialize: don't store deref types if not needed - type_cast: deduplicate types if the last one is the same - derive the type from the parent for other derefs Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	70a7f85149	nir/serialize: try to pack two alu srcs into 1 uint32 Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	ef4630cf4f	nir/serialize: pack nir_intrinsic_instr::const_index[] better Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	d3346b275a	nir/serialize: pack 1-component constants into 20 bits if possible The majority of constants can be packed like this. v2: - use enum for the packing encoding, - trim packed_value to 20 bits add 1 bit to last_component, which simplifies a later commit Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	75f7c38863	nir/serialize: pack load_const with non-64-bit constants better v2: use blob_write_uint8/16 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1) Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	a572ba673b	nir/serialize: try to store a diff in var data locations instead of var data Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	c8314678ee	nir/serialize: deduplicate serialized var types by reusing the last unique one Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	545415f45f	nir/serialize: don't serialize var->data for temporaries Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	c358c2b2bf	nir/serialize: pack src better and limit the object count to 1M from 1G We need to limit the object count to 1M to free 10 bits for the src modifiers. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	35655865cb	nir/serialize: pack instructions better Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Ian Romanick	ca353285cb	nir/range_analysis: Make sure the table validation only occurs once All of the tables are static const, so they only need to be validated once. As noted in the previous commit, the compiler should be able to eliminate all of this code when the assertions would pass. Even with the help of the previous commit, this does not always occur. -Og: -95.688 +/- 3.91935 (-24.9562% +/- 1.0222%) N=5 -O1: No difference proven at 95.0% confidence. N=5 -O2: -1.962 +/- 0.85001 (-0.860013% +/- 0.372589%) N=5 Reviewed-by: Eric Anholt <eric@anholt.net>	2019-11-22 08:16:06 -08:00
Ian Romanick	ccefce46cb	nir/range-analysis: Add pragmas to help loop unrolling I was pretty liberal with these assertions when I wrote this code because I had assumed that GCC would unroll the loops, inline the look ups of static const arrays with now constant indices, and then elmininate all the actuall assertions. It seems none of this happens even at -O3. Adding the pragmas helps encourage loop unrolling at some optimization levels. I tested by running shader-db with NIR_VALIDATE=false on a Core i7 Haswell desktop system. -Og: No difference proven at 95.0% confidence. N=5 -O1: -48.304 +/- 1.221 (-16.3343% +/- 0.412888%) N=5 -O2: -49.94 +/- 1.23521 (-17.9634% +/- 0.444303%) N=5 v2: Add a _Pragma to an inner loop that was accidentally dropped during a rebase. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-11-22 08:16:06 -08:00
Alyssa Rosenzweig	deaebc82a7	nir: Add load_sampler_lod_paramaters_pan intrinsic This loads in the <min_lod, max_lod, lod_bias> settings for a given sampler, which is necessary for lowering clamps/biases on certain Midgard chips. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-11-22 05:07:19 +00:00
Marek Olšák	0b1452ffdd	nir/serialize: do ctx = {0} instead of manual initializations Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-21 18:49:57 -05:00
Marek Olšák	ff71fae440	nir: strip as we serialize to remove the nir_shader_clone call Serializing stripped NIR is faster now. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-21 18:49:57 -05:00
Dave Airlie	cce07ea835	nir: fix deref offset builder Use the correct bit size Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-11-22 04:37:41 +10:00
Dave Airlie	7325f6ac98	vtn/opencl: add clz support This is needed for OpenCL Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-11-22 04:37:41 +10:00
Dave Airlie	d0d96053e6	nir: add 64-bit ufind_msb lowering support. (v2) This adds the option to lower 64-bit ufind_msb opcodes. v2: use split_x/y removes component loops (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-11-22 04:37:37 +10:00
Dave Airlie	12913bcf86	spirv/nir/opencl: handle some multiply instructions. This adds support for some missing 24-bit and hi multiply variants. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-11-22 04:37:25 +10:00
Karol Herbst	5934a53bfe	nir/validate: validate num_components on registers and intrinsics also make 8 and 16 compoments invalid. We will enable that later again when we actually support it. v2: fix validation of nir_intrinsic_instr::num_components correct validation of instr->num_components Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-11-21 01:10:24 +01:00
Rhys Perry	ca2de7ae9c	nir/large_constants: use nir_index_vars and nir_variable::index Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-20 15:05:42 +00:00
Rhys Perry	9f92e8b721	nir: add nir_variable::index and nir_index_vars This will be useful as a deterministic identifier/index for the variable. v2: fix comment style Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v1)	2019-11-20 15:05:42 +00:00
Rhys Perry	45a0b53490	nir: make nir_variable::{num_members,num_state_slots} a uint16_t Doesn't shrink it (at least, on x86-64) and leaves space for more members. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-20 15:05:42 +00:00
Neil Roberts	f6b5abe91a	nir/lower_alu_to_scalar: Support lowering 8- and 16-bit reduce ops Reviewed-by: Rob Clark <robdclark@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-11-20 14:09:43 +01:00
Neil Roberts	634eb9c04b	nir: Add a 8-bit bool type Adds nir_type_bool8 as well as 8-bit versions of all the bool opcodes. Reviewed-by: Rob Clark <robdclark@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-11-20 14:09:43 +01:00
Neil Roberts	0f5640c577	nir: Add a 16-bit bool type Adds nir_type_bool16 as well as 16-bit versions of all the bool opcodes. Reviewed-by: Rob Clark <robdclark@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-11-20 14:09:43 +01:00

1 2 3 4 5 ...

2003 commits