fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 22:20:14 +01:00

Author	SHA1	Message	Date
Marek Olšák	f798513f91	nir: add i2imp and u2ump opcodes for conversions to mediump Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5002>	2020-06-02 20:01:18 +00:00
Alyssa Rosenzweig	fcbc022787	nir: Add un/pack_32_4x8 opcodes Complement the existing un/pack_32_2x16 opcodes. These are useful for 8-bit format packing. On Midgard, they are equivalent to just a 32-bit move, but other GPUs could lower to other packs if needed. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5107>	2020-05-25 20:03:52 +00:00
Alyssa Rosenzweig	c2b0f3c17d	nir: Add fclamp_pos opcode Corresponds to the .pos modifier on all Mali GPUs (lima and panfrost). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5102>	2020-05-19 20:21:27 +00:00
Alyssa Rosenzweig	0aedce417a	nir: Add fsat_signed opcode Exists on later Mali. Equivalent to clamp(x, -1.0, 1.0) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5102>	2020-05-19 20:21:27 +00:00
Rhys Perry	abc4a82857	nir: make fsat return 0.0 with NaN instead of passing it through This is how lower_fsat and ACO implements fsat and is a more useful definition since it can be exactly created from fmin(fmax(a, 0.0), 1.0). Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3716>	2020-05-07 10:39:19 +00:00
Gert Wollny	49ce749d0e	nir: Add umad24 and umul24 opcodes So far only the singed versions are defined. v2: Make umad24 and umul24 non-driver specific (Eric Anholt) v3: Take care of nir_builder and automatic lowering of the opcodes if they are not supported by the backend. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4610>	2020-04-23 18:23:04 +00:00
Jason Ekstrand	7c43b8ce1b	nir: Delete the fnoise opcodes As of the previous commit, they are never used. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4624>	2020-04-21 06:16:13 +00:00
Rob Clark	bf64648864	nir: fix definition of imadsh_mix16 for vectors Fixes: `c27b3758fa` ("nir/opcodes: Add new 'umul_low' and 'imadsh_mix16' opcodes") Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4423>	2020-04-04 00:07:10 +00:00
Jason Ekstrand	b2db84153a	nir: Add b2b opcodes These exist to convert between different types of boolean values. In particular, we want to use these for uniform and shared memory operations where we need to convert to a reasonably sized boolean but we don't care what its format is so we don't want to make the back-end insert an actual i2b/b2i. In the case of uniforms, Mesa can tweak the format of the uniform boolean to whatever the driver wants. In the case of shared, every value in a shared variable comes from the shader so it's already in the right boolean format. The new boolean conversion opcodes get replaced with mov in lower_bool_to_int/float32 so the back-end will hopefully never see them. However, while we're in the middle of optimizing our NIR, they let us have sensible load_uniform/ubo intrinsics and also have the bit size conversion. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>	2020-03-30 15:46:19 +00:00
Albert Astals Cid	d988061172	cube_face_index: Use fabsf instead of fabs since we know it's floats Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3933> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3933>	2020-02-26 21:47:01 +00:00
Albert Astals Cid	6db7467b59	cube_face_coord: Use fabsf instead of fabs since we know it's floats Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3933>	2020-02-26 21:47:01 +00:00
Neil Roberts	125f867d3d	nir/opcodes: Add nir_op_f2fmp This opcode is the same as the f2f16 opcode except that it comes with a promise that it is safe to optimise it out if the result is immediately converted back to a 32-bit float again. Normally this would be a lossy conversion and so it would be visible to the application, but if the conversion is generated as part of the mediump lowering process then this removal doesn’t matter. The opcode is eventually replaced with a regular f2f16 in the late optimisations so the backends don’t need to handle it. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>	2020-02-24 17:24:13 +00:00
Ian Romanick	1d97d186fb	nir: Mark fmin and fmax as commutative and associative Per the resolution of Khronos GLSL issue 80 (https://github.com/KhronosGroup/GLSL/issues/80). Spec updates have not landed yet, but I'll get to it soon. :) The extra hurt shaders on Gen8+ are a handful of shaders that see things like bcsel(fmin(b - a, a - c) >= 0, x, y) converted to bcsel(a >= b && c >= a, x, y) The former can be generated as a CSEL instruction. If either b - a or a - c is used elsewhere in the shader, this saves an instruction. All Haswell+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14550188 -> 14550048 (<.01%) instructions in affected programs: 12168 -> 12028 (-1.15%) helped: 30 HURT: 3 helped stats (abs) min: 1 max: 17 x̄: 4.77 x̃: 2 helped stats (rel) min: 0.05% max: 3.85% x̄: 1.77% x̃: 1.80% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.50% max: 0.50% x̄: 0.50% x̃: 0.50% 95% mean confidence interval for instructions value: -6.15 -2.33 95% mean confidence interval for instructions %-change: -2.00% -1.12% Instructions are helped. total cycles in shared programs: 203770286 -> 203771464 (<.01%) cycles in affected programs: 688466 -> 689644 (0.17%) helped: 172 HURT: 220 helped stats (abs) min: 1 max: 286 x̄: 12.15 x̃: 6 helped stats (rel) min: 0.03% max: 5.97% x̄: 0.70% x̃: 0.35% HURT stats (abs) min: 1 max: 578 x̄: 14.85 x̃: 6 HURT stats (rel) min: 0.03% max: 32.36% x̄: 1.21% x̃: 0.52% 95% mean confidence interval for cycles value: -0.74 6.75 95% mean confidence interval for cycles %-change: 0.15% 0.59% Inconclusive result (value mean confidence interval includes 0). total fills in shared programs: 4525 -> 4523 (-0.04%) fills in affected programs: 48 -> 46 (-4.17%) helped: 1 HURT: 0 Ivy Bridge total instructions in shared programs: 11858995 -> 11858898 (<.01%) instructions in affected programs: 10822 -> 10725 (-0.90%) helped: 25 HURT: 13 helped stats (abs) min: 1 max: 17 x̄: 5.32 x̃: 2 helped stats (rel) min: 0.40% max: 5.00% x̄: 2.16% x̃: 1.85% HURT stats (abs) min: 1 max: 15 x̄: 2.77 x̃: 2 HURT stats (rel) min: 0.47% max: 2.90% x̄: 1.83% x̃: 2.15% 95% mean confidence interval for instructions value: -4.66 -0.45 95% mean confidence interval for instructions %-change: -1.54% -0.05% Instructions are helped. total cycles in shared programs: 177947023 -> 177946880 (<.01%) cycles in affected programs: 822075 -> 821932 (-0.02%) helped: 157 HURT: 175 helped stats (abs) min: 1 max: 164 x̄: 13.17 x̃: 4 helped stats (rel) min: 0.03% max: 6.72% x̄: 0.64% x̃: 0.17% HURT stats (abs) min: 1 max: 308 x̄: 11.00 x̃: 4 HURT stats (rel) min: 0.03% max: 9.76% x̄: 0.70% x̃: 0.18% 95% mean confidence interval for cycles value: -3.86 3.00 95% mean confidence interval for cycles %-change: -0.09% 0.22% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 4185 -> 4188 (0.07%) spills in affected programs: 146 -> 149 (2.05%) helped: 0 HURT: 1 total fills in shared programs: 5248 -> 5249 (0.02%) fills in affected programs: 347 -> 348 (0.29%) helped: 0 HURT: 1 Sandy Bridge total instructions in shared programs: 10680224 -> 10680144 (<.01%) instructions in affected programs: 4702 -> 4622 (-1.70%) helped: 15 HURT: 3 helped stats (abs) min: 1 max: 17 x̄: 5.53 x̃: 5 helped stats (rel) min: 0.39% max: 4.76% x̄: 2.17% x̃: 1.67% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.52% max: 0.52% x̄: 0.52% x̃: 0.52% 95% mean confidence interval for instructions value: -7.24 -1.65 95% mean confidence interval for instructions %-change: -2.55% -0.89% Instructions are helped. total cycles in shared programs: 152988780 -> 152985691 (<.01%) cycles in affected programs: 1072850 -> 1069761 (-0.29%) helped: 168 HURT: 145 helped stats (abs) min: 1 max: 592 x̄: 33.90 x̃: 12 helped stats (rel) min: 0.02% max: 10.73% x̄: 0.90% x̃: 0.31% HURT stats (abs) min: 1 max: 259 x̄: 17.98 x̃: 6 HURT stats (rel) min: 0.02% max: 8.17% x̄: 0.77% x̃: 0.19% 95% mean confidence interval for cycles value: -17.95 -1.79 95% mean confidence interval for cycles %-change: -0.34% 0.08% Inconclusive result (%-change mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8107033 -> 8107025 (<.01%) instructions in affected programs: 696 -> 688 (-1.15%) helped: 5 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.60 x̃: 2 helped stats (rel) min: 0.34% max: 7.14% x̄: 3.47% x̃: 4.65% 95% mean confidence interval for instructions value: -2.28 -0.92 95% mean confidence interval for instructions %-change: -7.22% 0.28% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 188348526 -> 188348404 (<.01%) cycles in affected programs: 33618 -> 33496 (-0.36%) helped: 23 HURT: 0 helped stats (abs) min: 2 max: 12 x̄: 5.30 x̃: 6 helped stats (rel) min: 0.05% max: 1.83% x̄: 0.47% x̃: 0.51% 95% mean confidence interval for cycles value: -6.70 -3.91 95% mean confidence interval for cycles %-change: -0.64% -0.30% Cycles are helped. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1359>	2020-02-10 18:37:36 -08:00
Ian Romanick	21f0d020fe	nir: Add new instructions for INTEL_shader_integer_functions2 uctz isn't added because it will implemented in the GLSL path and the SPIR-V path using other pre-existing instructions. v2: Avoid signed integer overflow for uabs_isub(0, INT_MIN). Noticed by Caio. v3: Alternate fix for signed integer overflow for abs_sub(0, INT_MIN). I tried the previous methon in a small test program with -ftrapv, and it failed. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v1] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>	2020-01-23 00:18:57 +00:00
Rob Clark	a8ec4082a4	nir+vtn: vec8+vec16 support This introduces new vec8 and vec16 instructions (which are the only instructions taking more than 4 sources), in order to construct 8 and 16 component vectors. In order to avoid fixing up the non-autogenerated nir_build_alu() sites and making them pass 16 src args for the benefit of the two instructions that take more than 4 srcs (ie vec8 and vec16), nir_build_alu() is has nir_build_alu_tail() split out and re-used by nir_build_alu2() (which is used for the > 4 src args case). v2 (Karol Herbst): use nir_build_alu2 for vec8 and vec16 use python's array multiplication syntax add nir_op_vec helper simplify nir_vec nir_build_alu_tail -> nir_builder_alu_instr_finish_and_insert use nir_build_alu for opcodes with <= 4 sources v3 (Karol Herbst): fix nir_serialize v4 (Dave Airlie): fix serialization of glsl_type handle vec8/16 in lowering of bools v5 (Karol Herbst): fix load store vectorizer Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-12-21 11:00:17 +00:00
Neil Roberts	634eb9c04b	nir: Add a 8-bit bool type Adds nir_type_bool8 as well as 8-bit versions of all the bool opcodes. Reviewed-by: Rob Clark <robdclark@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-11-20 14:09:43 +01:00
Neil Roberts	0f5640c577	nir: Add a 16-bit bool type Adds nir_type_bool16 as well as 16-bit versions of all the bool opcodes. Reviewed-by: Rob Clark <robdclark@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-11-20 14:09:43 +01:00
Neil Roberts	2ec97e78a9	nir/opcodes: Add a helper function to generate reduce opcodes Adds binop_reduce_all_sizes which generates both 1-bit and 32-bit versions of the reduce operation. This reduces the code duplication a bit and will make it easier to later add 16-bit versions as well. Reviewed-by: Rob Clark <robdclark@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-11-20 14:09:43 +01:00
Neil Roberts	9a96afb97e	nir/opcodes: Add a helper function to generate the comparison binops Adds binop_compare_all_sizes which generates both 1-bit and 32-bit versions of the comparison operation. This reduces the code duplication a bit and will make it easier to later add 16-bit versions as well. Reviewed-by: Rob Clark <robdclark@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-11-20 14:09:43 +01:00
Rob Clark	6320e37d4b	nir: add amul instruction Used for address/offset calculation (ie. array derefs), where we can potentially use less than 32b for the multiply of array idx by element size. For backends that support `imul24`, this gives a lowering pass an easy way to find multiplies that potentially can be converted to `imul24`. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>	2019-10-18 15:08:54 -07:00
Rob Clark	0568761f8e	nir: Add a new ALU nir_op_imul24 Some hardware can do 24b multiply in a single instruction, but not 32b. However in most cases 24b is sufficient for address/offset calculation. Signed-off-by: Rob Clark <robdclark@chromium.org> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>	2019-10-18 15:08:54 -07:00
Eduardo Lima Mitev	32e5fbf47c	nir: Add a new ALU nir_op_imad24_ir3 ir3 compiler has a signed integer multiply-add instruction (MAD_S24) that is used for different offset calculations in the backend. Since we intend to move some of these calculations to NIR, we need a new ALU op that can directly represent it. Signed-off-by: Rob Clark <robdclark@chromium.org> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>	2019-10-18 15:08:54 -07:00
Andres Gomez	d9760f8935	nir/opcodes: Clear variable names confusion Having Python and C variables sharing name in the same block of code makes its understanding a bit confusing. Make it explicit that the Python bit_size variable refers to the destination bit size. Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-18 23:59:07 +03:00
Samuel Iglesias Gonsálvez	dba4d0a319	nir: fix fmin/fmax support for doubles Until now, it was using the floating point version of fmin/fmax, instead of the double version. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-09-17 23:39:18 +03:00
Samuel Iglesias Gonsálvez	1e0e3ed15a	nir: fix denorms in unpack_half_1x16() According to VK_KHR_shader_float_controls: "Denormalized values obtained via unpacking an integer into a vector of values with smaller bit width and interpreting those values as floating-point numbers must: be flushed to zero, unless the entry point is declared with the code:DenormPreserve execution mode." v2: - Add nir_op_unpack_half_2x16_flush_to_zero opcode (Connor). v3: - Adapt to use the new NIR lowering framework (Andres). v4: - Updated to renamed shader info member and enum values (Andres). v5: - Simplify flags logic operations (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v2]	2019-09-17 23:39:18 +03:00
Samuel Iglesias Gonsálvez	ef681cf971	nir/opcodes: make sure f2f16_rtz and f2f16_rtne behavior is not overriden by the float controls execution mode Suggested-by: Connor Abbott <cwabbott0@gmail.com> Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-09-17 23:39:18 +03:00
Samuel Iglesias Gonsálvez	7580707345	nir: mind rounding mode on fadd, fsub, fmul and fma opcodes According to Vulkan spec, the new execution modes affect only correctly rounded SPIR-V instructions, which includes fadd, fsub and fmul. v2: - Fix fmul, fsub and fadd round-to-zero definitions, they should use auxiliary functions to calculate the proper value because Mesa uses round-to-nearest-even rounding mode by default (Connor). v3: - Do an actual fused multiply-add at ffma (Connor). v4: - Simplify fadd and fmul for bit sizes < 64 (Connor). - Do not use double ffma for 32 bits float (Connor). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v3]	2019-09-17 23:39:18 +03:00
Samuel Iglesias Gonsálvez	0ac07c7ca7	nir: add support for round to zero rounding mode to nir_op_f2f32 f2f16's rounding modes are already handled and f2f64 don't need it as there is not a floating point type with higher bit size than 64 for now. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-09-17 23:39:18 +03:00
Erico Nunes	4a407df682	nir/algebraic: add new fsum ops and fdot lowering The Mali400 pp doesn't implement fdot but has fsum3 and fsum4, which can be used to optimize fdot lowering. fsum2 is not implemented and can be further lowered to an add with the vector components. Currently lima ppir handles this lowering internally, however this happens in a very late stage and requires a big chunk of code compared to a nir_opt_algebraic lowering. By having fsum in nir, we can reduce ppir complexity and enable the lowered ops to be part of other nir optimizations in the optimization loop. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-31 21:35:58 +02:00
Sagar Ghuge	81d342e2a1	nir: Add urol and uror opcodes Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-01 10:14:22 -07:00
Jonathan Marek	a70ff70158	nir: remove fnot/fxor/fand/for opcodes There doesn't seem to be any reason to keep these opcodes around: * fnot/fxor are not used at all. * fand/for are only used in lower_alu_to_scalar, but easily replaced Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-06-26 15:26:10 -04:00
Daniel Schürmann	a8b0b6e52b	nir: introduce lowering of bitfield_insert to bfm and a new opcode bitfield_select. bitfield_select is defined as: bitfield_select(mask, base, insert) = (mask & base) \| (~mask & insert) matching the behavior of AMD's BFI instruction. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-24 18:42:20 +02:00
Daniel Schürmann	165b7f3a44	nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec. That is: the five least significant bits provide the values of 'bits' and 'offset' which is the case for all hardware currently supported by NIR and using the bfm/bfe instructions. This patch also changes the lowering of bitfield_insert/extract using shifts to not use bfm and removes the flag 'lower_bfm'. Tested-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-24 18:42:20 +02:00
Eduardo Lima Mitev	c27b3758fa	nir/opcodes: Add new 'umul_low' and 'imadsh_mix16' opcodes 'umul_low' is the low 32-bits of unsigned integer multiply. It maps directly to ir3's MULL_U. 'imadsh_mix16' is multiply add with shift and mix, an ir3 specific instruction that maps directly to ir3's IMADSH_M16. Both are necessary for the lowering of integer multiplication on Freedreno, which will be introduced later in this series. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 08:45:05 +02:00
Jason Ekstrand	f2dc0f2872	nir: Drop imov/fmov in favor of one mov instruction The difference between imov and fmov has been a constant source of confusion in NIR for years. No one really knows why we have two or when to use one vs. the other. The real reason is that they do different things in the presence of source and destination modifiers. However, without modifiers (which many back-ends don't have), they are identical. Now that we've reworked nir_lower_to_source_mods to leave one abs/neg instruction in place rather than replacing them with imov or fmov instructions, we don't need two different instructions at all anymore. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Acked-by: Rob Clark <robdclark@chromium.org>	2019-05-24 08:38:11 -05:00
Ian Romanick	7b4ff6a1af	nir: Mark ffma as 2src_commutative This doesn't make any real difference now, but future work (not in this series) will add a LOT of ffma patterns. Having to duplicate all of them for ffma(a, b, c) and ffma(b, a, c) is just terrible. No shader-db changes on any Intel platform. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 11:25:02 -07:00
Ian Romanick	ede45bf9cf	nir: Rename commutative to 2src_commutative The meaning of the new name is that the first two sources are commutative. Since this is only currently applied to two-source operations, there is no change. A future change will mark ffma as 2src_commutative. It is also possible that future work will add 3src_commutative for opcodes like fmin3. v2: s/commutative_2src/2src_commutative/g. I had originally considered this, but I discarded it because I did't want to deal with identifiers that (should) start with 2. Jason suggested it in review, so we decided that _2src_commutative would be used in nir_opcodes.py. Also add some comments documenting what 2src_commutative means. Also suggested by Jason. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 11:25:02 -07:00
Ian Romanick	85e6865ff6	nir: Saturating integer arithmetic is not associative In 8-bits, iadd_sat(iadd_sat(0x7f, 0x7f), -1) = iadd_sat(0x7f, -1) = 0x7e but, iadd_sat(0x7f, iadd_sat(0x7f, -1)) = iadd_sat(0x7f, 0x7e) = 0x7f Fixes: `272e927d0e` ("nir/spirv: initial handling of OpenCL.std extension opcodes") Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-01 09:07:47 -07:00
Kristian H. Kristensen	41593f3c37	nir_opcodes.py: Saturate to expression that doesn't overflow Compiler warns about overflow when assigning UINT64_MAX to something smaller than a uin64_t: src/compiler/nir/nir_constant_expressions.c:16909:50: warning: implicit conversion from 'unsigned long long' to 'uint1_t' (aka 'unsigned char') changes value from 18446744073709551615 to 255 [-Wconstant-conversion] uint1_t dst = (src0 + src1) < src0 ? UINT64_MAX : (src0 + src1); ~~~ ^~~~~~~~~~ Shift UINT64_MAX down to the appropriate maximum value for the type being assigned to. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-19 16:17:37 +00:00
Rhys Perry	8671cfe2a2	nir,ac/nir: fix cube_face_coord Seems it was missing the "/ ma + 0.5" and the order was swapped. Fixes: `a1a2a8dfda` ('nir: add AMD_gcn_shader extended instructions') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-15 17:22:47 +01:00
Samuel Pitoiset	6ae5797243	nir: use generic float types for frexp_exp and frexp_sig Only the exponent needs to be 32-bit signed integer. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-22 19:41:44 +01:00
Iago Toral Quiroga	ca2b5e9069	compiler/nir: add an is_conversion field to nir_op_info This is set to True only for numeric conversion opcodes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Karol Herbst	272e927d0e	nir/spirv: initial handling of OpenCL.std extension opcodes Not complete, mostly just adding things as I encounter them in CTS. But not getting far enough yet to hit most of the OpenCL.std instructions. Anyway, this is better than nothing and covers the most common builtins. v2: add hadd proof from Jason move some of the lowering into opt_algebraic and create new nir opcodes simplify nextafter lowering fix normalize lowering for inf rework upsample to use nir_pack_bits add missing files to build systems v3: split lines of iadd/sub_sat expressions Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-05 22:28:29 +01:00
Sagar Ghuge	e551040c60	nir/glsl: Add another way of doing lower_imul64 for gen8+ On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We can reduce our 64x64 int multiplication from 4 instructions to 3. Also instead of emitting two mul instructions, we can emit single mul instuction and extract low/high 32 bits from 64 bit result for [i/u]mulExtended v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand) 2) Add lower_mul_2x32_64 flag (Matt Turner) 3) Remove associative property as bit size is different (Connor Abbott) v3: Fix indentation and variable naming convention (Jason Ekstrand) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-04 15:50:25 -08:00
Daniel Schürmann	0525bdc225	nir: Define shifts according to SM5 specification. SPIR-V shifts are undefined for values >= bitsize, but SM5 shifts are defined to only use the least significant bits. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-25 12:59:43 -06:00
Jason Ekstrand	191a1dce92	nir: Add 1-bit Boolean opcodes We also have to add support for 1-bit integers while we're here so we get 1-bit variants of iand, ior, and inot. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	80e8dfe9de	nir: Rename Boolean-related opcodes to include 32 in the name This is a squash of a bunch of individual changes: nir/builder: Generate 32-bit bool opcodes transparently nir/algebraic: Remap Boolean opcodes to the 32-bit variant Use 32-bit opcodes in the NIR producers and optimizations Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Use 32-bit opcodes in the NIR back-ends Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Ian Romanick	090e282407	nir: Add a saturated unsigned integer add opcode Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-13 17:49:48 +00:00
Jason Ekstrand	9525971e2b	nir: Allow [iu]mul_high on non-32-bit types Reviewed-by: Ian Romanick ian.d.romanick@intel.com	2018-12-13 17:49:48 +00:00
Jason Ekstrand	dca6cd9ce6	nir: Make boolean conversions sized just like the others Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:03:07 -06:00

1 2

92 commits