fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 18:00:13 +01:00

Author	SHA1	Message	Date
Dave Airlie	330e28155f	nir: add 32-bit bool of fisfinite Add the bool lowering as well. Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12207>	2021-08-06 12:06:21 +10:00
Ian Romanick	72259a870f	util: Add and use functions to calculate min and max int for a size Many places need to know the maximum or minimum possible value for a given size integer... so everyone just open-codes their favorite version. There is some potential to hit either undefined or implementation-defined behavior, so having one version that Just Works seems beneficial. v2: Fix copy-and-pasted bug (INT64_MAX instead of INT64_MIN) in u_intmin. Noticed by CI. Lol. Rename functions `s/u_(uint\|int)(min\|max)/u_\1N_\2/g`. Suggested by Jason. Add some unit tests that would have caught the copy-and-paste bug before wasting CI time. Change the implementation of u_intN_min to use the same pattern as stdint.h. This avoids the integer division. Noticed by Jason. v3: Add changes to convert_clear_color (src/gallium/drivers/iris/iris_clear.c). Suggested by Nanley. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12177>	2021-08-03 12:55:02 -07:00
Sagar Ghuge	e8dff256c0	nir: Add new opcode for ternary addition v2: - Make it 2src commutative (Connor Abbott) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11596>	2021-07-16 15:59:55 +00:00
Thomas H.P. Andersen	ffea622604	nir/ifind_msb_rev: fix input check ifind_msb_rev was introduced in `a5747f8ab3`. ifind_msb_rev guards against src0 being both 0 or -1 at the same time. That is always true. This patch changes it to check for those values individually. Spotted from a compile warning. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Fixes: `a5747f8ab3` (\"nir: add opcodes for *find_msb_rev and lowering\") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11630>	2021-07-04 12:17:58 +00:00
Alyssa Rosenzweig	3da23a9c7e	nir: Fix constant folding for irhadd/urhadd This should be a subtract, not an add. The comment's proof is correct, but the (wrong) expression we actually use isn't what it's in the comment! Correct the discrepancy. The lowering in nir_opt_algebraic was correctly typed. Fixes: `272e927d0e` ("nir/spirv: initial handling of OpenCL.std extension opcodes") Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11671>	2021-07-02 00:21:22 +00:00
Jason Ekstrand	f00b5a30f5	nir: Require vectorized ALU ops to be all-or-nothing Long ago, the semantics of bcsel were such that it took a single boolean value and selected between whole vectors. These days, it takes a vector boolean with the assumption that if you want the old behavior you can just use a .xxxx swizzle. There currently are no opcodes which use a output_size of 0 but have a scalar or fixed-vector input. Let's disallow it for now to force us to think through the semantics again if this ever comes up as something someone actually wants. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11438>	2021-06-21 16:46:59 +00:00
Jason Ekstrand	2e08bae9b3	nir,vc4: Suffix a bunch of unorm 4x8 opcodes _vc4 Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11463>	2021-06-21 09:04:08 -05:00
Jason Ekstrand	0afbfee8da	nir,panfrost: Suffix fsat_signed and fclamp_pos with _mali Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11463>	2021-06-21 09:03:34 -05:00
Jason Ekstrand	f0f713960b	nir,amd: Suffix nir_op_cube_face_coord/index with _amd Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11463>	2021-06-21 09:03:34 -05:00
Timur Kristóf	c92dab8e2b	nir: Add nir_op_sad_u8x4 which corresponds to AMD's v_sad_u8. NIR currently doesn't have any intrinsics for a horizontal packed add, so this one is modeled after AMD's v_sad_u8. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>	2021-06-09 16:48:51 +00:00
Rhys Perry	1cbcfb8b38	nir, nir/algebraic: add byte/word insertion instructions Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3151>	2021-06-08 08:57:42 +00:00
Jesse Natalie	d7ca0319d7	nir: Add relaxed 24bit opcodes These are equivalent to the 32bit opcodes if there are no more efficient 24bit opcodes available, but inputs are guaranteed to already be 24bit, so the 24bit opcodes can be used instead if they exist and are efficient. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10549>	2021-05-05 22:06:42 +00:00
Alyssa Rosenzweig	a976101da5	nir/opcodes: Reword confusing comment Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10578>	2021-05-03 12:51:47 +00:00
Alyssa Rosenzweig	0ea67e57e5	nir: Add fsin_agx opcode Used to split up the fsin/fcos lowering for AGX between NIR and the backend, to permit algebraic optimizations without polluting NIR with too many hardware details. The backend NIR lowering produces an fmul/ffma of the input so we can optimize code like sin(2*x). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10582>	2021-05-02 17:41:09 -04:00
Jesse Natalie	3c8bcdc863	nir: Add a new opcode for [un]packing doubles HLSL doesn't support bitcasting a 64bit integer to a double. DXIL doesn't have generic pack/unpack instructions, so we lower those to integer bitwise ops. As a result, NIR generic double pack/unpack would require our backend to emit a bitcast to get a double, but we want to match HLSL semantics and emit MakeDouble/SplitDouble. Adding a dedicated opcode for double pack/unpack allows us to add a pass to emit that instead, which lets our backend emit the right instruction to pack and unpack doubles. Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10063>	2021-04-09 01:54:33 +00:00
Gert Wollny	318701b803	nir: Add r600 specific sin and cos variants r600 expect the input values to be normalited by divinding by 2 *PI, so add an opcode to be able to lower this in nir. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9452>	2021-03-22 15:19:46 +01:00
Gert Wollny	0f5b3c37c5	nir: Add opcodes for fused comp + csel and optimizations Some backends, like r600 support a fused version of int and float compare against zero and and csel. Adding these opcodes here makes it possible to optimize this in nir. v2: Add rules for float compare + csel Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9452>	2021-03-22 15:19:46 +01:00
Gert Wollny	a5747f8ab3	nir: add opcodes for find_msb_rev and lowering Some hardware supports a version of find_msb where the bits are counted starting at the high bit, and this needs some lowering to obtain the value that is expected by find_msb Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9452>	2021-03-22 15:19:46 +01:00
Gert Wollny	e5db9c3dd4	nir: Add r600 specific CUBE opcode to evaluate cube texture coords and face The opcode evaluates tha unnormalized coordinates, the length of the major axis, and the cube face. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Acked-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9200>	2021-02-26 09:51:37 +01:00
Rhys Perry	95819663b7	nir: allow 5 component vectors These will be useful for sparse texture instructions and image load intrinsics. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7774>	2021-01-06 20:36:38 +00:00
Ian Romanick	71961c73a9	nir: Correctly constant fold fsign(NaN) and fsign(-0) GLSL and SPIR-V GLSL.std.450 don't have any requirements for fsign(NaN), and both only require that FSign(-0.0) == 0.0. OpenCL, on the other hand, requires sign(-0.0) be exactly -0.0. It also requires that sign(NaN) be exactly 0.0. In practice, this change is difficult to test. Our GLSL frontend already constant folds sign(NaN) to 0.0 before even getting to NIR. As far as I can tell, glslang does the same. I don't have a good way to run an OpenCL SPIR-V test. Maybe SPIR-V GLSL.std.450 assembly? No shader-db or fossil-db changes on any Intel platform. Acked-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>	2021-01-05 02:07:09 +00:00
Ian Romanick	363efc2823	nir: Make some notes about fsign versus NaN This commit only documents the current behavior, even if that behavior is not the behavior preferred by the relevant specs. In SPIR-V, there are two flavors of the sign instruction, and each lives in an extended instruction set. The GLSL.std.450 FSign instruction is defined as: Result is 1.0 if x > 0, 0.0 if x = 0, or -1.0 if x < 0. This also matches the GLSL 4.60 definition. However, the OpenCL.ExtendedInstructionSet.100 sign instruction is defined as: Returns 1.0 if x > 0, -0.0 if x = -0.0, +0.0 if x = +0.0, or -1.0 if x < 0. Returns 0.0 if x is a NaN. There are two differences. Each treats -0.0 differently, and each also treats NaN differently. Specifically, GLSL.std.450 FSign does not define any specific behavior for NaN. There has been some discussion in Khronos about the NaN behavior of GLSL.std.450 FSign. As part of that discussion, I did some research into how we treat NaN for nir_op_fsign, and this commit just captures some of those notes. v2: Document the expected behavior of nir_op_fsign more thoroughly. Suggested by Rhys. Note that the current implementation of constant folding does not produce the expected result for NaN. Suggested by Caio. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v1] Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>	2021-01-05 02:07:09 +00:00
Rhys Perry	24a18b1a4b	nir: scalarize fdot in reverse This will create code that is easier to combine into MADs/FMA when the last component is 1.0. nir_opt_algebraic_late has an optimization to do something similar but it only works for inexact code, if the multiplication-by-1 optimization is done before it and if the backend enables fuse_ffma. fossil-db (Navi): Totals from 85583 (74.64% of 114665) affected shaders: SGPRs: 4556060 -> 4558596 (+0.06%); split: -0.07%, +0.12% VGPRs: 3315060 -> 3312984 (-0.06%); split: -0.23%, +0.17% SpillSGPRs: 13552 -> 13553 (+0.01%) CodeSize: 184962756 -> 184431388 (-0.29%); split: -0.32%, +0.03% MaxWaves: 1208693 -> 1209361 (+0.06%); split: +0.17%, -0.11% Instrs: 35678819 -> 35361617 (-0.89%); split: -0.91%, +0.02% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5631>	2020-11-03 14:56:00 +00:00
Ian Romanick	67956689bb	nir: Rename replicated-result dot-product instructions All these instructions replicate the result of a N-component dot-product to a vec4. Naming them fdot_replicatedN gives the impression that are some sort of abstract dot-product that replicates the result to a vecN. They also deviate from fdph_replicated... which nobody would reasonably consider naming fdot_replicatedh. Naming these opcodes fdotN_replicated more closely matches what they are, and it matches the pattern of fdph_replicated. I believe that the only reason these opcodes were named this way was because it simplified the implementation of the binop_reduce function in nir_opcodes.py. I made some fairly simple changes to that function, and I think the end result is ok. The bulk of the changes come from the sed rename: sed --in-place -e 's/fdot_replicated$[234]$/fdot\1_replicated/g' \ $(grep -r 'fdot_replicated[234]' src/) v2: Use a named parameter to binop_reduce instead of using isinstance(name, str). Suggested by Jason. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5725>	2020-10-22 18:00:19 +00:00
Tony Wasserka	6a9dc75cc2	nir: Fix undefined behavior due to signed integer multiplication overflows Notably this happened when applying constant folding on the intermediate computations generated from nir_lower_idiv. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6728>	2020-10-07 19:50:01 +00:00
Marek Olšák	cdd498bbe8	nir: add new mediump opcodes f2[ui]mp, i2fmp, u2fmp Algebraic optimizations will select them. Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6283>	2020-09-10 23:35:13 +00:00
Marek Olšák	385b4dbc39	nir: enforce 32-bit src type requirement for f2fmp and i2imp Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6283>	2020-09-10 23:35:13 +00:00
Marek Olšák	3d3df8dbff	nir: remove redundant opcode u2ump Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6283>	2020-09-10 23:35:13 +00:00
Daniel Schürmann	a79dad950b	nir,amd: remove trinary_minmax opcodes These consist of the variations nir_op_{i\|u\|f}{min\|max\|med}3 which are either lowered in the backend (LLVM) anyway or can be recombined by the backend (ACO). Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6421>	2020-08-24 20:56:11 +00:00
Karol Herbst	e5899c1e88	nir: rename nir_op_fne to nir_op_fneu It was always fneu but naming it fne causes confusion from time to time. So lets rename it. Later we also want to add other unordered and fne, this is a smaller preparation for that. Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6377>	2020-08-21 17:26:21 +00:00
Rhys Perry	27ec38d746	nir: fix potential left shift of a negative value Fixes UBSan error: src/compiler/nir/nir_constant_expressions.c:36573:32: runtime error: left shift of negative value -1 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6206>	2020-08-20 10:52:19 +00:00
Jesse Natalie	af59e4c400	nir: Add fisfinite op Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6355>	2020-08-17 15:34:08 -07:00
Jesse Natalie	9ebbed6ddc	nir: Add fisnormal op Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6355>	2020-08-17 15:34:00 -07:00
Jesse Natalie	456edf0b30	nir: Support 8 and 16 component vectors for reduceable intrinsics Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6030>	2020-07-23 18:23:20 -07:00
Rhys Perry	9a389322c4	nir: slight correction to cube_face_coord constant folding ACO does the division with a rcp and then a multiplication. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5547>	2020-06-22 10:28:40 +00:00
Marek Olšák	f798513f91	nir: add i2imp and u2ump opcodes for conversions to mediump Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5002>	2020-06-02 20:01:18 +00:00
Alyssa Rosenzweig	fcbc022787	nir: Add un/pack_32_4x8 opcodes Complement the existing un/pack_32_2x16 opcodes. These are useful for 8-bit format packing. On Midgard, they are equivalent to just a 32-bit move, but other GPUs could lower to other packs if needed. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5107>	2020-05-25 20:03:52 +00:00
Alyssa Rosenzweig	c2b0f3c17d	nir: Add fclamp_pos opcode Corresponds to the .pos modifier on all Mali GPUs (lima and panfrost). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5102>	2020-05-19 20:21:27 +00:00
Alyssa Rosenzweig	0aedce417a	nir: Add fsat_signed opcode Exists on later Mali. Equivalent to clamp(x, -1.0, 1.0) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5102>	2020-05-19 20:21:27 +00:00
Rhys Perry	abc4a82857	nir: make fsat return 0.0 with NaN instead of passing it through This is how lower_fsat and ACO implements fsat and is a more useful definition since it can be exactly created from fmin(fmax(a, 0.0), 1.0). Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3716>	2020-05-07 10:39:19 +00:00
Gert Wollny	49ce749d0e	nir: Add umad24 and umul24 opcodes So far only the singed versions are defined. v2: Make umad24 and umul24 non-driver specific (Eric Anholt) v3: Take care of nir_builder and automatic lowering of the opcodes if they are not supported by the backend. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4610>	2020-04-23 18:23:04 +00:00
Jason Ekstrand	7c43b8ce1b	nir: Delete the fnoise opcodes As of the previous commit, they are never used. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4624>	2020-04-21 06:16:13 +00:00
Rob Clark	bf64648864	nir: fix definition of imadsh_mix16 for vectors Fixes: `c27b3758fa` ("nir/opcodes: Add new 'umul_low' and 'imadsh_mix16' opcodes") Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4423>	2020-04-04 00:07:10 +00:00
Jason Ekstrand	b2db84153a	nir: Add b2b opcodes These exist to convert between different types of boolean values. In particular, we want to use these for uniform and shared memory operations where we need to convert to a reasonably sized boolean but we don't care what its format is so we don't want to make the back-end insert an actual i2b/b2i. In the case of uniforms, Mesa can tweak the format of the uniform boolean to whatever the driver wants. In the case of shared, every value in a shared variable comes from the shader so it's already in the right boolean format. The new boolean conversion opcodes get replaced with mov in lower_bool_to_int/float32 so the back-end will hopefully never see them. However, while we're in the middle of optimizing our NIR, they let us have sensible load_uniform/ubo intrinsics and also have the bit size conversion. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>	2020-03-30 15:46:19 +00:00
Albert Astals Cid	d988061172	cube_face_index: Use fabsf instead of fabs since we know it's floats Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3933> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3933>	2020-02-26 21:47:01 +00:00
Albert Astals Cid	6db7467b59	cube_face_coord: Use fabsf instead of fabs since we know it's floats Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3933>	2020-02-26 21:47:01 +00:00
Neil Roberts	125f867d3d	nir/opcodes: Add nir_op_f2fmp This opcode is the same as the f2f16 opcode except that it comes with a promise that it is safe to optimise it out if the result is immediately converted back to a 32-bit float again. Normally this would be a lossy conversion and so it would be visible to the application, but if the conversion is generated as part of the mediump lowering process then this removal doesn’t matter. The opcode is eventually replaced with a regular f2f16 in the late optimisations so the backends don’t need to handle it. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>	2020-02-24 17:24:13 +00:00
Ian Romanick	1d97d186fb	nir: Mark fmin and fmax as commutative and associative Per the resolution of Khronos GLSL issue 80 (https://github.com/KhronosGroup/GLSL/issues/80). Spec updates have not landed yet, but I'll get to it soon. :) The extra hurt shaders on Gen8+ are a handful of shaders that see things like bcsel(fmin(b - a, a - c) >= 0, x, y) converted to bcsel(a >= b && c >= a, x, y) The former can be generated as a CSEL instruction. If either b - a or a - c is used elsewhere in the shader, this saves an instruction. All Haswell+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14550188 -> 14550048 (<.01%) instructions in affected programs: 12168 -> 12028 (-1.15%) helped: 30 HURT: 3 helped stats (abs) min: 1 max: 17 x̄: 4.77 x̃: 2 helped stats (rel) min: 0.05% max: 3.85% x̄: 1.77% x̃: 1.80% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.50% max: 0.50% x̄: 0.50% x̃: 0.50% 95% mean confidence interval for instructions value: -6.15 -2.33 95% mean confidence interval for instructions %-change: -2.00% -1.12% Instructions are helped. total cycles in shared programs: 203770286 -> 203771464 (<.01%) cycles in affected programs: 688466 -> 689644 (0.17%) helped: 172 HURT: 220 helped stats (abs) min: 1 max: 286 x̄: 12.15 x̃: 6 helped stats (rel) min: 0.03% max: 5.97% x̄: 0.70% x̃: 0.35% HURT stats (abs) min: 1 max: 578 x̄: 14.85 x̃: 6 HURT stats (rel) min: 0.03% max: 32.36% x̄: 1.21% x̃: 0.52% 95% mean confidence interval for cycles value: -0.74 6.75 95% mean confidence interval for cycles %-change: 0.15% 0.59% Inconclusive result (value mean confidence interval includes 0). total fills in shared programs: 4525 -> 4523 (-0.04%) fills in affected programs: 48 -> 46 (-4.17%) helped: 1 HURT: 0 Ivy Bridge total instructions in shared programs: 11858995 -> 11858898 (<.01%) instructions in affected programs: 10822 -> 10725 (-0.90%) helped: 25 HURT: 13 helped stats (abs) min: 1 max: 17 x̄: 5.32 x̃: 2 helped stats (rel) min: 0.40% max: 5.00% x̄: 2.16% x̃: 1.85% HURT stats (abs) min: 1 max: 15 x̄: 2.77 x̃: 2 HURT stats (rel) min: 0.47% max: 2.90% x̄: 1.83% x̃: 2.15% 95% mean confidence interval for instructions value: -4.66 -0.45 95% mean confidence interval for instructions %-change: -1.54% -0.05% Instructions are helped. total cycles in shared programs: 177947023 -> 177946880 (<.01%) cycles in affected programs: 822075 -> 821932 (-0.02%) helped: 157 HURT: 175 helped stats (abs) min: 1 max: 164 x̄: 13.17 x̃: 4 helped stats (rel) min: 0.03% max: 6.72% x̄: 0.64% x̃: 0.17% HURT stats (abs) min: 1 max: 308 x̄: 11.00 x̃: 4 HURT stats (rel) min: 0.03% max: 9.76% x̄: 0.70% x̃: 0.18% 95% mean confidence interval for cycles value: -3.86 3.00 95% mean confidence interval for cycles %-change: -0.09% 0.22% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 4185 -> 4188 (0.07%) spills in affected programs: 146 -> 149 (2.05%) helped: 0 HURT: 1 total fills in shared programs: 5248 -> 5249 (0.02%) fills in affected programs: 347 -> 348 (0.29%) helped: 0 HURT: 1 Sandy Bridge total instructions in shared programs: 10680224 -> 10680144 (<.01%) instructions in affected programs: 4702 -> 4622 (-1.70%) helped: 15 HURT: 3 helped stats (abs) min: 1 max: 17 x̄: 5.53 x̃: 5 helped stats (rel) min: 0.39% max: 4.76% x̄: 2.17% x̃: 1.67% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.52% max: 0.52% x̄: 0.52% x̃: 0.52% 95% mean confidence interval for instructions value: -7.24 -1.65 95% mean confidence interval for instructions %-change: -2.55% -0.89% Instructions are helped. total cycles in shared programs: 152988780 -> 152985691 (<.01%) cycles in affected programs: 1072850 -> 1069761 (-0.29%) helped: 168 HURT: 145 helped stats (abs) min: 1 max: 592 x̄: 33.90 x̃: 12 helped stats (rel) min: 0.02% max: 10.73% x̄: 0.90% x̃: 0.31% HURT stats (abs) min: 1 max: 259 x̄: 17.98 x̃: 6 HURT stats (rel) min: 0.02% max: 8.17% x̄: 0.77% x̃: 0.19% 95% mean confidence interval for cycles value: -17.95 -1.79 95% mean confidence interval for cycles %-change: -0.34% 0.08% Inconclusive result (%-change mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8107033 -> 8107025 (<.01%) instructions in affected programs: 696 -> 688 (-1.15%) helped: 5 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.60 x̃: 2 helped stats (rel) min: 0.34% max: 7.14% x̄: 3.47% x̃: 4.65% 95% mean confidence interval for instructions value: -2.28 -0.92 95% mean confidence interval for instructions %-change: -7.22% 0.28% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 188348526 -> 188348404 (<.01%) cycles in affected programs: 33618 -> 33496 (-0.36%) helped: 23 HURT: 0 helped stats (abs) min: 2 max: 12 x̄: 5.30 x̃: 6 helped stats (rel) min: 0.05% max: 1.83% x̄: 0.47% x̃: 0.51% 95% mean confidence interval for cycles value: -6.70 -3.91 95% mean confidence interval for cycles %-change: -0.64% -0.30% Cycles are helped. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1359>	2020-02-10 18:37:36 -08:00
Ian Romanick	21f0d020fe	nir: Add new instructions for INTEL_shader_integer_functions2 uctz isn't added because it will implemented in the GLSL path and the SPIR-V path using other pre-existing instructions. v2: Avoid signed integer overflow for uabs_isub(0, INT_MIN). Noticed by Caio. v3: Alternate fix for signed integer overflow for abs_sub(0, INT_MIN). I tried the previous methon in a small test program with -ftrapv, and it failed. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v1] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>	2020-01-23 00:18:57 +00:00
Rob Clark	a8ec4082a4	nir+vtn: vec8+vec16 support This introduces new vec8 and vec16 instructions (which are the only instructions taking more than 4 sources), in order to construct 8 and 16 component vectors. In order to avoid fixing up the non-autogenerated nir_build_alu() sites and making them pass 16 src args for the benefit of the two instructions that take more than 4 srcs (ie vec8 and vec16), nir_build_alu() is has nir_build_alu_tail() split out and re-used by nir_build_alu2() (which is used for the > 4 src args case). v2 (Karol Herbst): use nir_build_alu2 for vec8 and vec16 use python's array multiplication syntax add nir_op_vec helper simplify nir_vec nir_build_alu_tail -> nir_builder_alu_instr_finish_and_insert use nir_build_alu for opcodes with <= 4 sources v3 (Karol Herbst): fix nir_serialize v4 (Dave Airlie): fix serialization of glsl_type handle vec8/16 in lowering of bools v5 (Karol Herbst): fix load store vectorizer Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-12-21 11:00:17 +00:00

1 2 3

127 commits