fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 22:20:14 +01:00

Author	SHA1	Message	Date
Topi Pohjolainen	ea42ba36b9	intel/compiler/icl: Use tcs barrier id bits 24:30 instead of 24:27 Similarly to `1cc17fb731` Fixes gpu hangs with dEQP-VK.tessellation.shader_input_output.barrier Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2019-04-17 14:55:49 +03:00
Karol Herbst	14531d676b	nir: make nir_const_value scalar v2: remove & operator in a couple of memsets add some memsets v3: fixup lima Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)	2019-04-14 22:25:56 +02:00
Jason Ekstrand	6b1c398bcb	intel/nir: Take a nir_tex_instr and src index in brw_texture_offset This makes things a bit simpler and it's also more robust because it no longer has a hard dependency on the offset being a 32-bit value.	2019-04-14 22:25:56 +02:00
Timothy Arceri	035759b61b	nir/i965/freedreno/vc4: add a bindless bool to type size functions This required to calculate sizes correctly when we have bindless samplers/images. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Caio Marcelo de Oliveira Filho	ef0339d5ea	intel/fs: Use TEX_LOGICAL whenever implicit lod is supported Make sure we include compute shaders that have a derivative group defined. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Ian Romanick	55e6454d5e	intel/fs: Fix extract_u8 of an odd byte from a 64-bit integer In the old code, we would generate the exact same instruction for extract_u8(some_u64, 0) and extract_u8(some_u64, 1). The mask-a-word trick only works for even numbered bytes. This fixes the (new) piglit test tests/spec/arb_gpu_shader_int64/execution/fs-ushr-and-mask.shader_test. v2: Use a SHR instead of an AND. This saves an instruction compared to using two moves. Suggested by Jason. Fixes: `6ac2d16901` ("i965/fs: Fix extract_i8/u8 to a 64-bit destination") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-06 08:35:45 -08:00
Ian Romanick	4aaf139ea4	intel/fs: nir_op_extract_i8 extracts a byte, not a word Fixes: `6ac2d16901` ("i965/fs: Fix extract_i8/u8 to a 64-bit destination") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-06 08:35:42 -08:00
Jason Ekstrand	61e009d2c4	spirv: Use the same types for resource indices as pointers We need more space than just a 32-bit scalar and we have to burn all that space anyway so we may as well expose it to the driver. This also fixes a subtle bug when UBOs and SSBOs have different pointer types. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Sagar Ghuge	e551040c60	nir/glsl: Add another way of doing lower_imul64 for gen8+ On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We can reduce our 64x64 int multiplication from 4 instructions to 3. Also instead of emitting two mul instructions, we can emit single mul instuction and extract low/high 32 bits from 64 bit result for [i/u]mulExtended v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand) 2) Add lower_mul_2x32_64 flag (Matt Turner) 3) Remove associative property as bit size is different (Connor Abbott) v3: Fix indentation and variable naming convention (Jason Ekstrand) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-04 15:50:25 -08:00
Ian Romanick	d1d56f5f9a	intel/fs: Don't assert on b2f with a saturate modifier This ran afoul of Iris's use of nir_lower_clamp_color_outputs which applies fsat() before writes to vertex shader color outpus. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Fixes: `7725d60938` ("intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))")	2019-03-02 13:58:50 -08:00
Ian Romanick	1edf67fc3f	intel/fs: Generate if instructions with inverted conditions Per-platform results were all over the place, so I have included all the results here. There is an important note at the bottom of the commit message. Skylake total instructions in shared programs: 15184683 -> 15184679 (<.01%) instructions in affected programs: 2786 -> 2782 (-0.14%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.05% max: 0.84% x̄: 0.44% x̃: 0.44% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.96% 0.07% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 370961367 -> 370961173 (<.01%) cycles in affected programs: 205867 -> 205673 (-0.09%) helped: 5 HURT: 1 helped stats (abs) min: 1 max: 149 x̄: 39.60 x̃: 16 helped stats (rel) min: 0.02% max: 1.05% x̄: 0.45% x̃: 0.55% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03% 95% mean confidence interval for cycles value: -93.01 28.34 95% mean confidence interval for cycles %-change: -0.82% 0.08% Inconclusive result (value mean confidence interval includes 0). Broadwell total instructions in shared programs: 15465366 -> 15465362 (<.01%) instructions in affected programs: 2799 -> 2795 (-0.14%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.04% max: 0.84% x̄: 0.44% x̃: 0.44% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.96% 0.07% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 410938419 -> 410938531 (<.01%) cycles in affected programs: 566028 -> 566140 (0.02%) helped: 18 HURT: 17 helped stats (abs) min: 1 max: 16 x̄: 3.50 x̃: 1 helped stats (rel) min: <.01% max: 1.05% x̄: 0.13% x̃: <.01% HURT stats (abs) min: 1 max: 12 x̄: 10.29 x̃: 12 HURT stats (rel) min: <.01% max: 0.16% x̄: 0.08% x̃: 0.09% 95% mean confidence interval for cycles value: 0.31 6.09 95% mean confidence interval for cycles %-change: -0.10% 0.05% Inconclusive result (%-change mean confidence interval includes 0). Haswell total instructions in shared programs: 13749760 -> 13749759 (<.01%) instructions in affected programs: 2241 -> 2240 (-0.04%) helped: 1 HURT: 0 total cycles in shared programs: 385398913 -> 385398363 (<.01%) cycles in affected programs: 554914 -> 554364 (-0.10%) helped: 31 HURT: 1 helped stats (abs) min: 1 max: 453 x̄: 18.00 x̃: 6 helped stats (rel) min: <.01% max: 0.25% x̄: 0.03% x̃: 0.05% HURT stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8 HURT stats (rel) min: 0.06% max: 0.06% x̄: 0.06% x̃: 0.06% 95% mean confidence interval for cycles value: -45.88 11.51 95% mean confidence interval for cycles %-change: -0.05% -0.02% Inconclusive result (value mean confidence interval includes 0). Ivy Bridge total cycles in shared programs: 180663626 -> 180663881 (<.01%) cycles in affected programs: 472350 -> 472605 (0.05%) helped: 15 HURT: 30 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% HURT stats (abs) min: 8 max: 10 x̄: 9.00 x̃: 9 HURT stats (rel) min: 0.06% max: 0.14% x̄: 0.10% x̃: 0.10% 95% mean confidence interval for cycles value: 4.21 7.12 95% mean confidence interval for cycles %-change: 0.05% 0.08% Cycles are HURT. Sandy Bridge total cycles in shared programs: 154568664 -> 154569225 (<.01%) cycles in affected programs: 356486 -> 357047 (0.16%) helped: 1 HURT: 31 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.02% max: 0.02% x̄: 0.02% x̃: 0.02% HURT stats (abs) min: 4 max: 33 x̄: 18.16 x̃: 8 HURT stats (rel) min: 0.05% max: 0.23% x̄: 0.14% x̃: 0.10% 95% mean confidence interval for cycles value: 12.19 22.87 95% mean confidence interval for cycles %-change: 0.10% 0.16% Cycles are HURT. Iron Lake total instructions in shared programs: 8206589 -> 8206565 (<.01%) instructions in affected programs: 3024 -> 3000 (-0.79%) helped: 12 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.75% max: 0.83% x̄: 0.80% x̃: 0.80% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -0.82% -0.77% Instructions are helped. total cycles in shared programs: 187657428 -> 187656228 (<.01%) cycles in affected programs: 95748 -> 94548 (-1.25%) helped: 12 HURT: 0 helped stats (abs) min: 80 max: 120 x̄: 100.00 x̃: 100 helped stats (rel) min: 1.00% max: 1.66% x̄: 1.27% x̃: 1.21% 95% mean confidence interval for cycles value: -113.27 -86.73 95% mean confidence interval for cycles %-change: -1.43% -1.11% Cycles are helped. GM45 total instructions in shared programs: 5037569 -> 5037557 (<.01%) instructions in affected programs: 1521 -> 1509 (-0.79%) helped: 6 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.75% max: 0.83% x̄: 0.79% x̃: 0.79% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -0.83% -0.75% Instructions are helped. total cycles in shared programs: 128101478 -> 128100758 (<.01%) cycles in affected programs: 52746 -> 52026 (-1.37%) helped: 6 HURT: 0 helped stats (abs) min: 120 max: 120 x̄: 120.00 x̃: 120 helped stats (rel) min: 1.16% max: 1.66% x̄: 1.41% x̃: 1.41% 95% mean confidence interval for cycles value: -120.00 -120.00 95% mean confidence interval for cycles %-change: -1.70% -1.12% Cycles are helped. This change has almost no effect right now. However, removing this patch (but leaving the patch "nir/algebraic: Replace a bcsel of a b2f with a b2f(!(a \|\| b))") after adding a patch that removes !(a < b) -> (a >= b) optimizations (like https://patchwork.freedesktop.org/patch/264787/) has the following results on Skylake: Skylake total instructions in shared programs: 15071022 -> 15089710 (0.12%) instructions in affected programs: 1022219 -> 1040907 (1.83%) helped: 1 HURT: 3937 helped stats (abs) min: 41 max: 41 x̄: 41.00 x̃: 41 helped stats (rel) min: 1.01% max: 1.01% x̄: 1.01% x̃: 1.01% HURT stats (abs) min: 1 max: 256 x̄: 4.76 x̃: 4 HURT stats (rel) min: 0.05% max: 11.18% x̄: 2.59% x̃: 2.60% 95% mean confidence interval for instructions value: 4.56 4.93 95% mean confidence interval for instructions %-change: 2.54% 2.64% Instructions are HURT. total cycles in shared programs: 369777134 -> 370092923 (0.09%) cycles in affected programs: 17516573 -> 17832362 (1.80%) helped: 115 HURT: 3624 helped stats (abs) min: 1 max: 1721 x̄: 81.18 x̃: 28 helped stats (rel) min: <.01% max: 10.74% x̄: 1.24% x̃: 0.65% HURT stats (abs) min: 1 max: 12640 x̄: 89.71 x̃: 54 HURT stats (rel) min: <.01% max: 28.24% x̄: 4.72% x̃: 4.52% 95% mean confidence interval for cycles value: 75.21 93.71 95% mean confidence interval for cycles %-change: 4.43% 4.64% Cycles are HURT. total spills in shared programs: 9450 -> 9442 (-0.08%) spills in affected programs: 166 -> 158 (-4.82%) helped: 2 HURT: 0 total fills in shared programs: 21115 -> 21094 (-0.10%) fills in affected programs: 438 -> 417 (-4.79%) helped: 2 HURT: 0 LOST: 1 GAINED: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	7725d60938	intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a)) Since Boolean values are either -1 (true) or 0 (false), b2f(inot(a)) maps -1 => 0.0 and 0 => 1.0. This is equivalent to 1.0 + float(boolBitsToInt(a)). On Intel GPUs, ADD is one of the few instructions that can type-convert during write to destination, so we can achieve this in a single instruction: add g47F, g26D, 1D v2: Fix swizzles. v3: Fix typos in comments. Noticed by Ken. All Gen6+ platforms had similar results. (Skylake shown) Skylake total instructions in shared programs: 15185583 -> 15184683 (<.01%) instructions in affected programs: 239389 -> 238489 (-0.38%) helped: 899 HURT: 1 helped stats (abs) min: 1 max: 2 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.15% max: 1.85% x̄: 0.49% x̃: 0.44% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.09% max: 0.09% x̄: 0.09% x̃: 0.09% 95% mean confidence interval for instructions value: -1.01 -0.99 95% mean confidence interval for instructions %-change: -0.51% -0.48% Instructions are helped. total cycles in shared programs: 370964249 -> 370961508 (<.01%) cycles in affected programs: 1487586 -> 1484845 (-0.18%) helped: 420 HURT: 268 helped stats (abs) min: 1 max: 232 x̄: 22.41 x̃: 6 helped stats (rel) min: 0.05% max: 22.60% x̄: 1.30% x̃: 0.41% HURT stats (abs) min: 1 max: 230 x̄: 24.90 x̃: 10 HURT stats (rel) min: <.01% max: 21.60% x̄: 1.45% x̃: 0.52% 95% mean confidence interval for cycles value: -7.61 -0.36 95% mean confidence interval for cycles %-change: -0.44% -0.02% Cycles are helped. No changes on Iron Lake or GM45. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	cb3e21cd19	intel/fs: Use De Morgan's laws to avoid logical-not of a logic result on Gen8+ Instead of emitting ~(a & b), emit (~a \| ~b) since logical-not of operands is free on Gen8+. v2: Fix swizzles. Fix types for cmod propagation. v3: Simplify logic for inverting source of inot(ixor(a, b)). Suggested by Ken. Skylake and Broadwell had similar results. (Skylake shown) Skylake total instructions in shared programs: 15185593 -> 15185583 (<.01%) instructions in affected programs: 5673 -> 5663 (-0.18%) helped: 12 HURT: 1 helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.30% max: 5.88% x̄: 1.50% x̃: 0.70% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for instructions value: -1.66 0.13 95% mean confidence interval for instructions %-change: -2.60% -0.15% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 370977726 -> 370964249 (<.01%) cycles in affected programs: 869987 -> 856510 (-1.55%) helped: 15 HURT: 2 helped stats (abs) min: 2 max: 6640 x̄: 902.20 x̃: 16 helped stats (rel) min: <.01% max: 4.92% x̄: 1.71% x̃: 1.53% HURT stats (abs) min: 14 max: 42 x̄: 28.00 x̃: 28 HURT stats (rel) min: 1.08% max: 3.18% x̄: 2.13% x̃: 2.13% 95% mean confidence interval for cycles value: -1654.87 69.34 95% mean confidence interval for cycles %-change: -2.29% -0.23% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	8eb36c9129	intel/fs: Emit logical-not of operands on Gen8+ On Gen8+ specifying negation of a logical operation such as AND actually performs a logical-not. Take advantage of this to generate fewer instructions. v2: Major rebase. Use nir_src_as_alu_instr. Fix swizzle handling. No changes on any pre-Gen8 platform. Skylake and Broadwell had similar results. (Broadwell shown) total instructions in shared programs: 15466902 -> 15466274 (<.01%) instructions in affected programs: 1262953 -> 1262325 (-0.05%) helped: 682 HURT: 4 helped stats (abs) min: 1 max: 5 x̄: 1.02 x̃: 1 helped stats (rel) min: 0.03% max: 2.40% x̄: 0.18% x̃: 0.04% HURT stats (abs) min: 1 max: 62 x̄: 17.50 x̃: 3 HURT stats (rel) min: 0.03% max: 1.89% x̄: 0.53% x̃: 0.10% 95% mean confidence interval for instructions value: -1.10 -0.73 95% mean confidence interval for instructions %-change: -0.19% -0.15% Instructions are helped. total cycles in shared programs: 410996093 -> 410950440 (-0.01%) cycles in affected programs: 144389048 -> 144343395 (-0.03%) helped: 519 HURT: 51 helped stats (abs) min: 1 max: 1060 x̄: 104.46 x̃: 140 helped stats (rel) min: 0.01% max: 10.98% x̄: 0.34% x̃: 0.03% HURT stats (abs) min: 1 max: 4060 x̄: 167.90 x̃: 22 HURT stats (rel) min: <.01% max: 8.20% x̄: 0.96% x̃: 0.25% 95% mean confidence interval for cycles value: -97.16 -63.02 95% mean confidence interval for cycles %-change: -0.32% -0.13% Cycles are helped. total spills in shared programs: 95311 -> 95329 (0.02%) spills in affected programs: 881 -> 899 (2.04%) helped: 0 HURT: 4 total fills in shared programs: 93629 -> 93634 (<.01%) fills in affected programs: 794 -> 799 (0.63%) helped: 1 HURT: 2 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	06eaaf2de9	intel/fs: Refactor ALU source and destination handling to a separate function Other places will need to do this soon to properly handle source swizzles. The patch looks a little odd, but the change is pretty straight forward. All of the swizzle and mask handling is moved out, but the code for handling move instructions and vecN instructions remains in nir_emit_alu. I'm not terribly pleased with the "need_dest" parameter, but get_nir_dest is (somewhat surprisingly) destructive. I am open to suggestions of alternatives. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Jason Ekstrand	9d437f9482	intel/fs: Drop the fs_surface_builder All of the actual abstraction (except possibly setting size_written) happens as part of the logical opcodes. The only thing that the surface builder is providing at this point is extra levels of functions to call through. I'm going to be adding bindless image support soon and all the extra abstraction here is just getting in the way. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	367b0ede4d	intel/fs: Bail in optimize_extract_to_float if we have modifiers This fixes a bug in runscape where we were optimizing x >> 16 to an extract and then negating and converting to float. The NIR to fs pass was dropping the negate on the floor breaking a geometry shader and causing it to render nothing. Fixes: `1f862e923c` "i965/fs: Optimize float conversions of byte/word..." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109601 Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-02-14 23:02:44 -06:00
Eric Anholt	8f3694e1ab	intel: Use the NIR lowering for isign. Drops one instruction from fs-sign-int.shader_test. No change in shader-db due to it having 0 instances of sign(genIType). This may hurt isign64 if algebraic runs before int64 lowering, but I wasn't sure how to mark the algebraic opt as "every bit size but 64". v2: Update commit message about shader-db. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)	2019-02-14 00:32:30 +00:00
Jason Ekstrand	fd77606b5b	intel/fs: Use enumerated array assignments in fb read TXF setup It's more clear and means we don't have to update the array every time we add an optional texture instruction argument Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-11 10:57:09 -06:00
Jason Ekstrand	e644ed468f	intel/fs: Implement nir_intrinsic_global_atomic_* eviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 16:11:00 -06:00
Jason Ekstrand	1c25bf4373	intel/fs: Implement load/store_global with A64 untyped messages eviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 16:11:00 -06:00
Oscar Blumberg	fea5b8e5ad	intel/fs: Fix memory corruption when compiling a CS Missing check for shader stage in the fs_visitor would corrupt the cs_prog_data.push information and trigger crashes / corruption later when uploading the CS state. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 10:53:33 -08:00
Jason Ekstrand	f547cebbe0	intel/fs: Use a logical opcode for IMAGE_SIZE Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Kenneth Graunke	04c2f12ab2	i965: Drop mark_surface_used mechanism. The original idea was that the backend compiler could eliminate surfaces, so we would have it mark which ones are actually used, then shrink the binding table accordingly. Unfortunately, it's a pretty blunt mechanism - it can only prune things from the end, not the middle - since we decide the layout before we even start the backend compiler, and only limit the size. It also basically gives up if it sees indirect array access. Besides, we do the vast majority of our surface elimination in NIR anyway, not the backend - and I don't see that trend changing any time soon. Vulkan abandoned this plan a long time ago, and I don't use it in Iris, but it's still been kicking around in i965. I hacked shader-db to print the binding table size in bytes, and observed no changes with this patch. So, this code appears to do nothing useful. Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-13 09:35:32 -08:00
Matt Turner	7e4e9da90d	intel/compiler: Prevent warnings in the following patch The next patch replaces an unsigned bitfield with a plain unsigned, which triggers gcc to begin warning on signed/unsigned comparisons. Keeping this patch separate from the actual move allows bisectablity and generates no additional warnings temporarily. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-09 16:42:41 -08:00
Francisco Jerez	230a8a541d	intel/fs: Remove FS_OPCODE_UNPACK_HALF_2x16_SPLIT opcodes. These are broken on a future platform, but it turns out we don't need to fix them, since they're just type-converting moves with strided source. Kill them. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-09 12:03:09 -08:00
Francisco Jerez	cbea91eb57	intel/fs: Remove nasty open-coded CHV/BXT 64-bit workarounds. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-09 12:03:09 -08:00
Jason Ekstrand	80e8dfe9de	nir: Rename Boolean-related opcodes to include 32 in the name This is a squash of a bunch of individual changes: nir/builder: Generate 32-bit bool opcodes transparently nir/algebraic: Remap Boolean opcodes to the 32-bit variant Use 32-bit opcodes in the NIR producers and optimizations Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Use 32-bit opcodes in the NIR back-ends Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Ian Romanick	e639d39faf	i965/fs: Implement nir_op_uadd_sat Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-13 17:49:48 +00:00
Jason Ekstrand	cb98e0755f	intel/fs: Support min_lod parameters on texture instructions We have to lower some shadow instructions because they don't exist in hardware and we have to lower txb+offset+clamp because the message gets too big and we run into the sampler message length limit of 11 regs. Acked-by: Ian Romanick <ian.d.romanick@intel.com>	2018-12-11 21:26:23 -06:00
Jason Ekstrand	dca6cd9ce6	nir: Make boolean conversions sized just like the others Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:03:07 -06:00
Matt Turner	017199d2d2	mesa: Revert INTEL_fragment_shader_ordering support This extension is not properly tested (testing for GL_ARB_fragment_shader_interlock is not sufficient), and since this was noted in review on August 28th no tests have been sent. Revert "i965: Add INTEL_fragment_shader_ordering support." Revert "mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering" This reverts commit `03ecec9ed2`. This reverts commit `119435c877`. Cc: mesa-stable@lists.freedesktop.org Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Eric Anholt <eric@anholt.net>	2018-12-03 15:37:37 -08:00
Jason Ekstrand	dca35c598d	intel/fs,vec4: Fix a compiler warning ../src/intel/compiler/brw_fs_nir.cpp:3534:46: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘int’ [-Wsign-compare] assert(nir_intrinsic_write_mask(instr) == ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ (1 << instr->num_components) - 1); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This was caused by `6339aba775` which added these completely valid checks. However clang likes to complain about signedness mismatches. Fixes: `6339aba775` "intel/compiler: Lower SSBO and shared..." Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2018-11-19 09:57:41 -06:00
Jason Ekstrand	6339aba775	intel/compiler: Lower SSBO and shared loads/stores in NIR We have a bunch of code to do this in the back-end compiler but it's fairly specific to typed surface messages and the way we emit them. This breaks it out into NIR were it's easier to do things a bit more generally. It also means we can easily share the code between the vec4 and FS back-ends if we wish. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-15 19:59:49 -06:00
Jason Ekstrand	d28bc35ece	intel/fs: Add an assert to optimize_frontfacing_ternary Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-08 10:09:25 -06:00
Jason Ekstrand	344cfe6980	intel/fs: Use the new nir_src_is_const and friends As of this commit, all uses of const sources either go through a nir_src_as_<type> helper which handles bit sizes correctly or else are accompanied by a nir_src_bit_size() == 32 assertion to assert that we have the size we think we have. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-08 10:09:20 -06:00
Jason Ekstrand	6b2918709a	intel/fs,vec4: Clean up a repeated pattern with SSBOs Everywhere we handle SSBO intrinsics, we have exactly the same pattern for computing the index so we may as well make a helper for it. We also add a get_nir_src_imm to vec4 and use it for SSBO offsets. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-08 10:09:06 -06:00
Jason Ekstrand	d3a0d8b750	intel/compiler: Stop assuming the entrypoint is called "main" This isn't true for Vulkan so we have to whack it to "main" in anv which is silly. Instead of walking the list of functions and asserting that everything is named "main" and hoping there's only one function named "main", just use the nir_shader_get_entrypoint() helper which has better assertions anyway. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-10-30 20:14:52 -05:00
Jason Ekstrand	497675c21e	intel/fs: Fix nir_op_b2[fi] with 64-bit result on Gen8 LP and Gen9 LP Several of the Atom GPUs have additional restrictions on alignment when moving < 64-bit source to a 64-bit destination. All of the nir_op_264 code generation paths respected this, but nir_op_b2[fi] did not. Previous to commit `a68dd47b91` it was not possible to generate such an instruction from the GLSL path. It may have been possible from SPIR-V, but it's not clear. The aforementioned patch converts a 64-bit nir_op_fsign into a sequence of operations including a nir_op_b2f with a 64-bit result. This "just works" everywhere except these Atom parts. This problem was not detected during normal CI testing because the Atom parts are not included in developer builds. v2 (idr): Make the patch compile, and make some cosmetic changes. Add a commit message. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108319 Fixes: `a68dd47b91` "nir/algebraic: Simplify fsat of fsign" Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-10-11 15:21:19 -05:00
Ian Romanick	b44c9292b7	intel/compiler: Don't handle fsign.sat No shader-db or CI changes on any Intel platform. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-10-09 13:56:42 -07:00
Dylan Baker	8396043f30	Replace uses of _mesa_bitcount with util_bitcount and _mesa_bitcount_64 with util_bitcount_64. This fixes a build problem in nir for platforms that don't have popcount or popcountll, such as 32bit msvc. v2: - Fix additional uses of _mesa_bitcount added after this was originally written Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1) Acked-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-09-07 10:21:26 -07:00
Eric Engestrom	07ff56791d	intel/compiler: remove unused get_image_base_type() Unused since `09f1de97a7` "anv,i965: Lower away image derefs in the driver". Cc: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Eric Engestrom <eric@engestrom.ch> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-06 15:22:24 +01:00
Alejandro Piñeiro	4e1f8d82c2	i965/fs: include multisamplers on image_intrinsic_coord_components This is the second patch needed to fix the following piglit tests: tests/spec/arb_gl_spirv/linker/uniform/multisampler.shader_test tests/spec/arb_gl_spirv/linker/uniform/multisampler-array.shader_test Although in this case it doesn't affect so many borrowed tests, as there aren't too many tests using multisamplers on Intel. It is worth to note that this patch is also needed when those tests are run on GLSL mode (using the --glsl option). Although most Intel drivers would not be able to run/execute tests using multisamplers, as GL_MAX_IMAGE_SAMPLES is zero, technically those tests are expected to link correctly, so linking tests should pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-05 17:02:28 +02:00
Jason Ekstrand	3cbc02e469	intel: Use TXS for image_size when we have a typed surface Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:03 -05:00
Jason Ekstrand	09f1de97a7	anv,i965: Lower away image derefs in the driver Previously, the back-end compiler turn image access into magic uniform reads and there was a complex contract between back-end compiler and driver about setting up and filling out those params. As of this commit, both drivers now lower image_deref_load_param_intel intrinsics to load_uniform intrinsics controlled by the driver and lower the other image_deref_* intrinsics to image_* intrinsics which take an actual binding table index. There are still "magic" uniforms but they are now added and controlled entirely by the driver and that contract no longer spans components. This also has the side-effect of making most image use compile-time binding table indices. Previously, all image access pulled the binding table index from a uniform. Part of the reason for this was that the magic uniforms made it difficult to decouple binding table indices from the uniforms and, since they are indexed completely differently (especially in Vulkan), it was hard to pull them apart. Now that the driver is handling both, it's trivial to decouple the two and provide actual binding table indices. Shader-db results on Kaby Lake: total instructions in shared programs: 15166872 -> 15164293 (-0.02%) instructions in affected programs: 115834 -> 113255 (-2.23%) helped: 191 HURT: 0 total cycles in shared programs: 571311495 -> 571196465 (-0.02%) cycles in affected programs: 4757115 -> 4642085 (-2.42%) helped: 73 HURT: 67 total spills in shared programs: 10951 -> 10926 (-0.23%) spills in affected programs: 742 -> 717 (-3.37%) helped: 7 HURT: 0 total fills in shared programs: 22226 -> 22201 (-0.11%) fills in affected programs: 1146 -> 1121 (-2.18%) helped: 7 HURT: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:03 -05:00
Jason Ekstrand	37f7983bcc	intel/compiler: Do image load/store lowering to NIR This commit moves our storage image format conversion codegen into NIR instead of doing it in the back-end. This has the advantage of letting us run it through NIR's optimizer which is pretty effective at shrinking things down. In the common case of rgba8, the number of instructions emitted after NIR is done with it is half of what it was with the lowering happening in the back-end. On the downside, the back-end's lowering is able to directly use predicates and the NIR lowering has to use IFs. Shader-db results on Kaby Lake: total instructions in shared programs: 15166910 -> 15166872 (<.01%) instructions in affected programs: 5895 -> 5857 (-0.64%) helped: 15 HURT: 0 Clearly, we don't have that much image_load_store happening in the shaders in shader-db.... Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Ian Romanick	c856403868	i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for imageAtomicAdd of +1 or -1 v2: Refactor selection of atomic opcode to a separate function. Suggested by Jason. No changes on any other Intel platforms. Skylake total instructions in shared programs: 14304261 -> 14304241 (<.01%) instructions in affected programs: 1625 -> 1605 (-1.23%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5 helped stats (rel) min: 1.01% max: 14.29% x̄: 5.86% x̃: 4.07% 95% mean confidence interval for instructions value: -10.66 0.66 95% mean confidence interval for instructions %-change: -15.91% 4.19% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 527531226 -> 527531194 (<.01%) cycles in affected programs: 92204 -> 92172 (-0.03%) helped: 2 HURT: 0 Haswell and Broadwell had similar results. (Broadwell shown) total instructions in shared programs: 14615730 -> 14615710 (<.01%) instructions in affected programs: 1838 -> 1818 (-1.09%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5 helped stats (rel) min: 0.89% max: 13.04% x̄: 5.37% x̃: 3.78% 95% mean confidence interval for instructions value: -10.66 0.66 95% mean confidence interval for instructions %-change: -14.59% 3.85% Inconclusive result (value mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-28 15:35:46 -07:00
Ian Romanick	b6e247cf0e	i965/fs: Refactor image atomics to be a bit more like other atomics This greatly simplifies the next patch. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-28 15:35:46 -07:00
Ian Romanick	fabe3ead57	i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1 Funny story... a single shader was hurt for instructions, spills, fills. That same shader was also the most helped for cycles. #GPUsAreWeird No changes on any other Intel platform. v2: Refactor selection of atomic opcode to a separate function. Suggested by Jason. Haswell, Broadwell, and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14304116 -> 14304261 (<.01%) instructions in affected programs: 12776 -> 12921 (1.13%) helped: 19 HURT: 1 helped stats (abs) min: 1 max: 16 x̄: 2.32 x̃: 1 helped stats (rel) min: 0.05% max: 7.27% x̄: 0.92% x̃: 0.55% HURT stats (abs) min: 189 max: 189 x̄: 189.00 x̃: 189 HURT stats (rel) min: 4.87% max: 4.87% x̄: 4.87% x̃: 4.87% 95% mean confidence interval for instructions value: -12.83 27.33 95% mean confidence interval for instructions %-change: -1.57% 0.31% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 527552861 -> 527531226 (<.01%) cycles in affected programs: 1459195 -> 1437560 (-1.48%) helped: 16 HURT: 2 helped stats (abs) min: 2 max: 21328 x̄: 1353.69 x̃: 6 helped stats (rel) min: 0.01% max: 5.29% x̄: 0.36% x̃: 0.03% HURT stats (abs) min: 12 max: 12 x̄: 12.00 x̃: 12 HURT stats (rel) min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03% 95% mean confidence interval for cycles value: -3699.81 1295.92 95% mean confidence interval for cycles %-change: -0.94% 0.30% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 8025 -> 8033 (0.10%) spills in affected programs: 208 -> 216 (3.85%) helped: 1 HURT: 1 total fills in shared programs: 10989 -> 11040 (0.46%) fills in affected programs: 444 -> 495 (11.49%) helped: 1 HURT: 1 Ivy Bridge total instructions in shared programs: 11709181 -> 11709153 (<.01%) instructions in affected programs: 3505 -> 3477 (-0.80%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 23 x̄: 9.33 x̃: 4 helped stats (rel) min: 0.11% max: 1.16% x̄: 0.63% x̃: 0.61% total cycles in shared programs: 254741126 -> 254738801 (<.01%) cycles in affected programs: 919067 -> 916742 (-0.25%) helped: 3 HURT: 0 helped stats (abs) min: 21 max: 2144 x̄: 775.00 x̃: 160 helped stats (rel) min: 0.03% max: 0.90% x̄: 0.32% x̃: 0.03% total spills in shared programs: 4536 -> 4533 (-0.07%) spills in affected programs: 40 -> 37 (-7.50%) helped: 1 HURT: 0 total fills in shared programs: 4819 -> 4813 (-0.12%) fills in affected programs: 94 -> 88 (-6.38%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v1] Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-28 15:35:38 -07:00
Kevin Rogovin	03ecec9ed2	i965: Add INTEL_fragment_shader_ordering support. Adds suppport for INTEL_fragment_shader_ordering. We achieve the fragment ordering by using the same instruction as for beginInvocationInterlockARB() which is by issuing a memory fence via sendc. Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com> Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>	2018-08-28 17:15:10 +03:00

1 2 3 4

171 commits