fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 18:00:13 +01:00

Author	SHA1	Message	Date
Ian Romanick	cb3e21cd19	intel/fs: Use De Morgan's laws to avoid logical-not of a logic result on Gen8+ Instead of emitting ~(a & b), emit (~a \| ~b) since logical-not of operands is free on Gen8+. v2: Fix swizzles. Fix types for cmod propagation. v3: Simplify logic for inverting source of inot(ixor(a, b)). Suggested by Ken. Skylake and Broadwell had similar results. (Skylake shown) Skylake total instructions in shared programs: 15185593 -> 15185583 (<.01%) instructions in affected programs: 5673 -> 5663 (-0.18%) helped: 12 HURT: 1 helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.30% max: 5.88% x̄: 1.50% x̃: 0.70% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for instructions value: -1.66 0.13 95% mean confidence interval for instructions %-change: -2.60% -0.15% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 370977726 -> 370964249 (<.01%) cycles in affected programs: 869987 -> 856510 (-1.55%) helped: 15 HURT: 2 helped stats (abs) min: 2 max: 6640 x̄: 902.20 x̃: 16 helped stats (rel) min: <.01% max: 4.92% x̄: 1.71% x̃: 1.53% HURT stats (abs) min: 14 max: 42 x̄: 28.00 x̃: 28 HURT stats (rel) min: 1.08% max: 3.18% x̄: 2.13% x̃: 2.13% 95% mean confidence interval for cycles value: -1654.87 69.34 95% mean confidence interval for cycles %-change: -2.29% -0.23% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	8eb36c9129	intel/fs: Emit logical-not of operands on Gen8+ On Gen8+ specifying negation of a logical operation such as AND actually performs a logical-not. Take advantage of this to generate fewer instructions. v2: Major rebase. Use nir_src_as_alu_instr. Fix swizzle handling. No changes on any pre-Gen8 platform. Skylake and Broadwell had similar results. (Broadwell shown) total instructions in shared programs: 15466902 -> 15466274 (<.01%) instructions in affected programs: 1262953 -> 1262325 (-0.05%) helped: 682 HURT: 4 helped stats (abs) min: 1 max: 5 x̄: 1.02 x̃: 1 helped stats (rel) min: 0.03% max: 2.40% x̄: 0.18% x̃: 0.04% HURT stats (abs) min: 1 max: 62 x̄: 17.50 x̃: 3 HURT stats (rel) min: 0.03% max: 1.89% x̄: 0.53% x̃: 0.10% 95% mean confidence interval for instructions value: -1.10 -0.73 95% mean confidence interval for instructions %-change: -0.19% -0.15% Instructions are helped. total cycles in shared programs: 410996093 -> 410950440 (-0.01%) cycles in affected programs: 144389048 -> 144343395 (-0.03%) helped: 519 HURT: 51 helped stats (abs) min: 1 max: 1060 x̄: 104.46 x̃: 140 helped stats (rel) min: 0.01% max: 10.98% x̄: 0.34% x̃: 0.03% HURT stats (abs) min: 1 max: 4060 x̄: 167.90 x̃: 22 HURT stats (rel) min: <.01% max: 8.20% x̄: 0.96% x̃: 0.25% 95% mean confidence interval for cycles value: -97.16 -63.02 95% mean confidence interval for cycles %-change: -0.32% -0.13% Cycles are helped. total spills in shared programs: 95311 -> 95329 (0.02%) spills in affected programs: 881 -> 899 (2.04%) helped: 0 HURT: 4 total fills in shared programs: 93629 -> 93634 (<.01%) fills in affected programs: 794 -> 799 (0.63%) helped: 1 HURT: 2 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	06eaaf2de9	intel/fs: Refactor ALU source and destination handling to a separate function Other places will need to do this soon to properly handle source swizzles. The patch looks a little odd, but the change is pretty straight forward. All of the swizzle and mask handling is moved out, but the code for handling move instructions and vecN instructions remains in nir_emit_alu. I'm not terribly pleased with the "need_dest" parameter, but get_nir_dest is (somewhat surprisingly) destructive. I am open to suggestions of alternatives. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	fb3ca9109c	intel/fs: Handle OR source modifiers in algebraic optimization Found by inspection. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	c9d5bd050c	intel/fs: Relax type matching rules in cmod propagation from MOV instructions To allow cmod propagation from a MOV in a sequence like: and(16) g31<1>UD g20<8,8,1>UD g22<8,8,1>UD mov.nz.f0(16) null<1>F g31<8,8,1>D A similar change to the vec4 backend had no effect. Somewhere between `c1ec582059` and `40fc4b5acd` (1,094 commits) the effectiveness of this patch diminished, and as of commit `d7e0d47b9d` (nir: Add a bunch of b2[if] optimizations) this optimization no longer has any effect on any platform. A later patch "intel/fs: Use De Morgan's laws to avoid logical-not of a logic result on Gen8+," generates some instruction sequences that require this change in order for cmod propagation to make progress. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	d2056ab993	intel/vec4: Emit constants for some ALU sources as immediate values In some cases of flow control, the constant propagation is not able to determine that the source of an instruction must be a constant value. When we still have NIR SSA values, we can easily determine this. Emit the immediate value during code generation to possible avoid spurious loads of constants into registers. I wrote this patch to prevent a couple trivial regressions in vec4 shaders caused by "nir/algebraic: Replace i2b used by bcsel or if-statement with comparison". The final result was quite a bit better than that... No shader-db changes on any Gen8+ platform. v2: Assert that we never get a negation source modifier on Gen8+. Suggested by Ken. This should never happen because we don't normally use vec4 for Gen8+ (requires and environment variable to force it), and there's no code to generate these negations. Still, erring on the side of caution is better. Haswell total instructions in shared programs: 13776218 -> 13764783 (-0.08%) instructions in affected programs: 663931 -> 652496 (-1.72%) helped: 3495 HURT: 1 helped stats (abs) min: 1 max: 30 x̄: 3.28 x̃: 2 helped stats (rel) min: 0.21% max: 10.00% x̄: 1.79% x̃: 1.49% HURT stats (abs) min: 24 max: 24 x̄: 24.00 x̃: 24 HURT stats (rel) min: 12.24% max: 12.24% x̄: 12.24% x̃: 12.24% 95% mean confidence interval for instructions value: -3.39 -3.15 95% mean confidence interval for instructions %-change: -1.84% -1.75% Instructions are helped. total cycles in shared programs: 386818984 -> 386511910 (-0.08%) cycles in affected programs: 20379636 -> 20072562 (-1.51%) helped: 3052 HURT: 476 helped stats (abs) min: 2 max: 12516 x̄: 110.40 x̃: 6 helped stats (rel) min: 0.05% max: 24.68% x̄: 1.58% x̃: 0.69% HURT stats (abs) min: 2 max: 416 x̄: 62.76 x̃: 24 HURT stats (rel) min: 0.10% max: 10.75% x̄: 4.03% x̃: 2.18% 95% mean confidence interval for cycles value: -115.57 -58.51 95% mean confidence interval for cycles %-change: -0.93% -0.73% Cycles are helped. total spills in shared programs: 100482 -> 100480 (<.01%) spills in affected programs: 79 -> 77 (-2.53%) helped: 3 HURT: 1 total fills in shared programs: 96883 -> 96877 (<.01%) fills in affected programs: 85 -> 79 (-7.06%) helped: 4 HURT: 0 Ivy Bridge total instructions in shared programs: 12000562 -> 11990113 (-0.09%) instructions in affected programs: 572581 -> 562132 (-1.82%) helped: 3106 HURT: 0 helped stats (abs) min: 1 max: 30 x̄: 3.36 x̃: 2 helped stats (rel) min: 0.21% max: 10.00% x̄: 1.86% x̃: 1.49% 95% mean confidence interval for instructions value: -3.49 -3.23 95% mean confidence interval for instructions %-change: -1.91% -1.81% Instructions are helped. total cycles in shared programs: 180958504 -> 180664500 (-0.16%) cycles in affected programs: 19991810 -> 19697806 (-1.47%) helped: 2654 HURT: 486 helped stats (abs) min: 2 max: 12516 x̄: 121.61 x̃: 6 helped stats (rel) min: 0.05% max: 20.66% x̄: 1.48% x̃: 0.68% HURT stats (abs) min: 2 max: 396 x̄: 59.18 x̃: 24 HURT stats (rel) min: 0.05% max: 9.62% x̄: 3.82% x̃: 2.16% 95% mean confidence interval for cycles value: -125.62 -61.64 95% mean confidence interval for cycles %-change: -0.76% -0.56% Cycles are helped. Sandy Bridge total instructions in shared programs: 10842336 -> 10835438 (-0.06%) instructions in affected programs: 395340 -> 388442 (-1.74%) helped: 1926 HURT: 0 helped stats (abs) min: 1 max: 22 x̄: 3.58 x̃: 2 helped stats (rel) min: 0.10% max: 9.68% x̄: 1.78% x̃: 1.42% 95% mean confidence interval for instructions value: -3.73 -3.43 95% mean confidence interval for instructions %-change: -1.84% -1.72% Instructions are helped. total cycles in shared programs: 154590074 -> 154569050 (-0.01%) cycles in affected programs: 8159932 -> 8138908 (-0.26%) helped: 1670 HURT: 228 helped stats (abs) min: 2 max: 260 x̄: 18.13 x̃: 6 helped stats (rel) min: 0.02% max: 8.70% x̄: 0.74% x̃: 0.28% HURT stats (abs) min: 2 max: 1798 x̄: 40.58 x̃: 14 HURT stats (rel) min: 0.03% max: 12.97% x̄: 1.04% x̃: 0.31% 95% mean confidence interval for cycles value: -13.51 -8.64 95% mean confidence interval for cycles %-change: -0.60% -0.46% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8212357 -> 8206587 (-0.07%) instructions in affected programs: 323664 -> 317894 (-1.78%) helped: 1457 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 3.96 x̃: 3 helped stats (rel) min: 0.33% max: 11.49% x̄: 1.86% x̃: 1.44% 95% mean confidence interval for instructions value: -4.14 -3.78 95% mean confidence interval for instructions %-change: -1.93% -1.78% Instructions are helped. total cycles in shared programs: 187668016 -> 187657422 (<.01%) cycles in affected programs: 14856234 -> 14845640 (-0.07%) helped: 1372 HURT: 83 helped stats (abs) min: 2 max: 24 x̄: 7.92 x̃: 6 helped stats (rel) min: 0.02% max: 1.14% x̄: 0.12% x̃: 0.08% HURT stats (abs) min: 2 max: 14 x̄: 3.20 x̃: 2 HURT stats (rel) min: 0.03% max: 0.60% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for cycles value: -7.65 -6.91 95% mean confidence interval for cycles %-change: -0.11% -0.10% Cycles are helped. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:41:46 -08:00
Jason Ekstrand	e8f863e718	intel/compiler: Re-prefix non-logical surface opcodes with VEC4 The scalar back-end uses SHADER_OPCODE_SEND for all surface messages so we no longer need the non-logical opcodes there. Prefix them VEC4 so it's clear that they're only used by the vec4 back-end. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	95ae400abc	intel/schedule_instructions: Move some comments Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	aeaba24fcb	intel/compiler: Drop unused surface opcodes The unused typed surface read/write support in the vec4 back-end has been dropped and the fs back-end now uses SHADER_OPCODE_SEND for all image and buffer ops. There's no reason to keep these opcodes around anymore. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	a04c737215	intel/fs: Get rid of the IMAGE_SIZE opcode Since switching to SHADER_OPCODE_SEND for image operations, we no longer need the non-logical opcode. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	10b7d14c31	intel/vec4: Drop dead code for handling typed surface messages Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	9d437f9482	intel/fs: Drop the fs_surface_builder All of the actual abstraction (except possibly setting size_written) happens as part of the logical opcodes. The only thing that the surface builder is providing at this point is extra levels of functions to call through. I'm going to be adding bindless image support soon and all the extra abstraction here is just getting in the way. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	494a0543e6	intel/fs: Re-order logical surface arguments It makes more sense to start at the surface then move on to the address and then the data. Also, this is a really good test of whether or not we got all the places that use the sources by explicit integer number. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	94f8fd9a0c	intel/fs: Add an enum type for logical sampler inst sources Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Lionel Landwerlin	6e184147dd	intel/compiler: use correct swizzle for replacement The optimization in `4cd1a0be76` introduced a replacement of : cmp(8).z.f0.0 vgrf11.y:D, vgrf10.xxxx:D, vgrf2.xyyy:D ... cmp(8).nz.f0.0 null.x:D, vgrf11.yyyy:D, 0D By : cmp(8).z.f0.0 vgrf15.x:D, vgrf10.xxxx:D, vgrf2.yyyy:D ... mov(8) vgrf11.y:D, vgrf15.yyyy:D The first cmp instruction is storing in x while the second mov is sourcing from y. We need to take into account where the replacement on the scan_inst destination is going to store thing so that the replacement mov can source things from the correct location. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `4cd1a0be76` ("i965/vec4: Propagate conditional modifiers from more compares to other compares") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109759 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-27 20:06:42 +00:00
Kasireddy, Vivek	7cab8d3661	i965: Add support for sampling from XYUV images Add support to the i965 DRI driver to sample from XYUV8888 buffers. Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-26 13:08:52 +00:00
Jason Ekstrand	c4fb6b0c81	intel/eu: Add an EOT parameter to send_indirect_[split]_message For split indirect sends we have to put the EOT parameter in the extended descriptor as well as the instruction itself so just calling brw_inst_set_eot is insufficient. Moving the EOT handling handling into the send_indirect_[split]_message helper lets us handle it properly.	2019-02-25 11:35:12 -06:00
Francisco Jerez	7272fe9c08	intel/fs: Rely on undocumented unrestricted regioning for 32x16-bit integer multiply. Even though the hardware spec claims that any "integer DWord multiply" operation is affected by the regioning restrictions of CHV/BXT/GLK, this is inconsistent with the behavior of the simulator and with empirical evidence -- Return false from has_dst_aligned_region_restriction() for such instructions as a micro-optimization. Tested-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-21 14:07:25 -08:00
Francisco Jerez	e03be78252	intel/fs: Implement extended strides greater than 4 for IR source regions. Strides up to 32B can be implemented for the source regions of most instructions by leveraging either the vertical or the horizontal stride of the hardware Align1 region. The main motivation for this is that currently the lower_integer_multiplication() pass will happily double the stride of one of the 32-bit sources, which can blow up if the stride of the original source was already the maximum value allowed by the hardware. An alternative would be to use the regioning legalization pass in order to lower such strides into the composition of multiple legal strides, but that would be somewhat less efficient. This showed up as a regression from my commit `cbea91eb57` in Vulkan 1.1 CTS tests on CHV/BXT platforms, however it was really a pre-existing problem that had affected conformance on other platforms without native support for integer multiplication. CHV/BXT were getting around it because the code I removed in that commit had the "fortunate" side effect of emitting narrower regions that didn't hit the hardware stride limit after lowering. Beyond fixing the regression this fixes ~90 additional Vulkan 1.1 subgroup CTS tests on ICL (that's why this patch is marked for inclusion in mesa-stable even though the original regressing patch was not). According to Jason, a nearly equivalent change had been committed previously as `e8c9e65185` and then (mistakenly?) reverted as `a31d038208`. Cc: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328 Reported-by: Mark Janes <mark.a.janes@intel.com> Tested-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-21 14:07:25 -08:00
Francisco Jerez	7f9f6263c1	intel/fs: Cap dst-aligned region stride to maximum representable hstride value. This is required in combination with the following commit, because otherwise if a source region with an extended 8+ stride is present in the instruction (which we're about to declare legal) we'll end up emitting code that attempts to write to such a region, even though strides greater than four are still illegal for the destination. Tested-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-21 14:07:25 -08:00
Francisco Jerez	e2f475ddff	intel/fs: Lower integer multiply correctly when destination stride equals 4. Because the "low" temporary needs to be accessed with word type and twice the original stride, attempting to preserve the alignment of the original destination can potentially lead to instructions with illegal destination stride greater than four. Because the CHV/BXT alignment restrictions are now being enforced by the regioning lowering pass run after lower_integer_multiplication(), there is no real need to preserve the original strides anymore. Note that this bug can be reproduced on stable branches, but back-porting would be non-trivial, because the fix relies on the regioning lowering pass recently introduced. Tested-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-21 14:07:25 -08:00
Francisco Jerez	c3c27762f7	intel/fs: Exclude control sources from execution type and region alignment calculations. Currently the execution type calculation will return a bogus value in cases like: mov_indirect(8) vgrf0:w, vgrf1:w, vgrf2:ud, 32u Which will be considered to have a 32-bit integer execution type even though the actual indirect move operation will be carried out with 16-bit precision. Similarly there's no need to apply the CHV/BXT double-precision region alignment restrictions to such control sources, since they aren't directly involved in the double-precision arithmetic operations emitted by these virtual instructions. Applying the CHV/BXT restrictions to control sources was expected to be harmless if mildly inefficient, but unfortunately it exposed problems at codegen level for virtual instructions (namely the SHUFFLE instruction used for the Vulkan 1.1 subgroup feature) that weren't prepared to accept control sources with an arbitrary strided region. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328 Reported-by: Mark Janes <mark.a.janes@intel.com> Fixes: `efa4e4bc5f` "intel/fs: Introduce regioning lowering pass." Tested-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-21 14:07:25 -08:00
Kenneth Graunke	286b8b8f99	iris: handle PatchVerticesIn as a system value.	2019-02-21 10:26:10 -08:00
Matt Turner	ac21dd4aee	intel/compiler/test: Add unit test for mismatched signedness comparison v2 (idr): Move adding the test to after adding the fix. Reordering the two commits prevents possible headaches for git-bisect with scripts that always do 'ninja check'. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109404 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-15 11:11:02 -08:00
Matt Turner	2dff9a66b6	intel/compiler: Avoid propagating inequality cmods if types are different v2: Fix silly bug in logic. s/\|\|/&&/ All but one of the affected shaders is in an Unreal4 demo. The other is in Tomb Raider. All of the cases that Ian investigated appear to be sequences like the following if (int(uint(some_float)) < 0) /* other relations too */ ... At least in Tomb Raider, it's not obvious that this sequence came from the original shader. In some of the Unreal demos, the shader contains code like if (int(uint(textureLod(...))) > 0) ... which explicitly generates the offending sequence. All Gen6+ platforms had similar results (Skylake shown): total instructions in shared programs: 15437170 -> 15437187 (<.01%) instructions in affected programs: 4492 -> 4509 (0.38%) helped: 0 HURT: 17 HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.05% max: 0.73% x̄: 0.66% x̃: 0.73% 95% mean confidence interval for instructions value: 1.00 1.00 95% mean confidence interval for instructions %-change: 0.57% 0.75% Instructions are HURT. total cycles in shared programs: 383007996 -> 383007992 (<.01%) cycles in affected programs: 20542 -> 20538 (-0.02%) helped: 6 HURT: 7 helped stats (abs) min: 2 max: 6 x̄: 5.33 x̃: 6 helped stats (rel) min: 0.11% max: 0.36% x̄: 0.32% x̃: 0.36% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.27% max: 0.27% x̄: 0.27% x̃: 0.27% 95% mean confidence interval for cycles value: -3.30 2.69 95% mean confidence interval for cycles %-change: -0.19% 0.19% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109404 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: nagrigoriadis@gmail.com Tested-by: Danylo Piliaiev <danylo.piliaiev@gmail.com>	2019-02-15 11:11:02 -08:00
Matt Turner	e50db60d16	intel/compiler/test: Set devinfo->gen = 7 We emit an FBL instruction which only exists since Gen7. This prevents the test from segfaulting when run with TEST_DEBUG=1. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-15 11:11:02 -08:00
Jason Ekstrand	367b0ede4d	intel/fs: Bail in optimize_extract_to_float if we have modifiers This fixes a bug in runscape where we were optimizing x >> 16 to an extract and then negating and converting to float. The NIR to fs pass was dropping the negate on the floor breaking a geometry shader and causing it to render nothing. Fixes: `1f862e923c` "i965/fs: Optimize float conversions of byte/word..." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109601 Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-02-14 23:02:44 -06:00
Jason Ekstrand	5064464931	intel/fs: Silence a compiler warning Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-14 16:04:47 -06:00
Eric Anholt	8f3694e1ab	intel: Use the NIR lowering for isign. Drops one instruction from fs-sign-int.shader_test. No change in shader-db due to it having 0 instances of sign(genIType). This may hurt isign64 if algebraic runs before int64 lowering, but I wasn't sure how to mark the algebraic opt as "every bit size but 64". v2: Update commit message about shader-db. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)	2019-02-14 00:32:30 +00:00
Tapani Pälli	3da858a6b9	intel/compiler: add scale_factors to sampler_prog_key_data Patch propagates given scale_factors to lowering options. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-12 08:42:25 +02:00
Jason Ekstrand	fd77606b5b	intel/fs: Use enumerated array assignments in fb read TXF setup It's more clear and means we don't have to update the array every time we add an optional texture instruction argument Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-11 10:57:09 -06:00
Caio Marcelo de Oliveira Filho	ee670d09af	intel/compiler: use 0 as sampler in emit_mcs_fetch The sampler will be ignored since the underlying 'ld_mcs' operation won't use it, so just fill the field with 0 instead of the texture to make it clearer that's the case. This will also avoid is_high_sampler() to kick in unnecessarily, in case we are using the operation for a texture with index >= 16. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-08 14:51:56 -08:00
Ian Romanick	28ef5bb74c	intel/compiler: Silence warning about value that may be used uninitialized For some reason, this warning only occurs for me in release builds. In file included from src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c:25:0: src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c: In function ‘brw_nir_lower_mem_access_bit_sizes’: src/compiler/nir/nir_builder.h:501:26: warning: ‘src_swiz[2]’ may be used uninitialized in this function [-Wmaybe-uninitialized] alu_src.swizzle[i] = swiz[i]; ~~~~~~~~~~~~~~~~~~~^~~~~~~~~ src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c:225:16: note: ‘src_swiz[2]’ was declared here unsigned src_swiz[4]; ^~~~~~~~ Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-08 10:37:06 -08:00
Jason Ekstrand	e644ed468f	intel/fs: Implement nir_intrinsic_global_atomic_* eviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 16:11:00 -06:00
Jason Ekstrand	a91f392073	intel/fs: Use SENDS for A64 writes on gen9+ eviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 16:11:00 -06:00
Jason Ekstrand	1c25bf4373	intel/fs: Implement load/store_global with A64 untyped messages eviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 16:11:00 -06:00
Jason Ekstrand	b4f0d062cd	intel/fs: Do the grf127 hack on SIMD8 instructions in SIMD16 mode Previously, we only applied the fix to shaders with a dispatch mode of SIMD8 but the code it relies on for SIMD16 mode only applies to SIMD16 instructions. If you have a SIMD8 instruction in a SIMD16 shader, neither would trigger and the restriction could still be hit. Fixes: `232ed89802` "i965/fs: Register allocator shoudn't use grf127..." Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 16:11:00 -06:00
Jason Ekstrand	79724a0756	intel/fs: Properly handle 64-bit types in LOAD_PAYLOAD By just assigning dst.type to src[i].type, we ensure that the offset at the end of the loop actually offsets it by the right number of registers. Otherwise, we'll get into a case where we copy with a Q type and then offset with a D type and things get out of sync. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 16:10:57 -06:00
Jason Ekstrand	f02914a991	intel/fs/cse: Split create_copy_instr into three cases Previously, we tried to combine all cases where the instruction being CSE'd writes to more than one MOV worth of registers into one case with a bit of special casing for LOAD_PAYLOAD. This commit splits things so that LOAD_PAYLOAD is entirely it's own case. This makes tweaking the LOAD_PAYLOAD case simpler in the next commit. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 16:10:40 -06:00
Jason Ekstrand	f409a08e5f	intel/nir: Add global support to lower_mem_access_bit_sizes Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 16:08:29 -06:00
Oscar Blumberg	fea5b8e5ad	intel/fs: Fix memory corruption when compiling a CS Missing check for shader stage in the fs_visitor would corrupt the cs_prog_data.push information and trigger crashes / corruption later when uploading the CS state. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 10:53:33 -08:00
Jason Ekstrand	a920979d4f	intel/fs: Use split sends for surface writes on gen9+ Surface reads don't need them because they just have the one address payload. With surface writes, on the other hand, we can put the address and the data in the different halves and avoid building the payload all together. The decrease in register pressure and added freedom in register allocation resulting from this change reduces spilling enough to improve the performance of one customer benchmark by about 2x. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	014edff0d2	intel/fs: Add interference between SENDS sources Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	eab1c55590	intel/fs: Support SENDS in SHADER_OPCODE_SEND Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	cca199fd85	intel/disasm: Properly disassemble split sends Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	8babaa84e8	intel/eu: Add support for the SENDS[C] messages Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	d6a6e10390	intel/inst: Indent some code We're about to add some more if cases so let's have the giant re-indent in it's own patch to make review easier. Acked-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	d96969120d	intel/inst: Fix the ia16_addr_imm helpers These have clearly never seen any use.... On gen8, the bottom 4 bits are missing so we need to shift them off before we call set_bits and shift again when we get the bits. Found by inspection. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	e46fb33143	intel/disasm: Rework SEND decoding to use descriptors Instead of fetching the information out of the instruction directly, fetch the descriptor and then pluck the information out of the descriptor. The current scheme works ok for SEND but with SENDS, it all falls to pieces because the descriptor is completely shuffled around. This commit doesn't actually convert everything. One notable exception is URB messages which don't even use descriptors in emit_urb_WRITE yet. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	13a6fabc62	intel/eu: Add more message descriptor helpers We want to be able to extract data from descriptors as well as unify a bit of the descriptor construction. One of the unifications we do is to unify the read/write and dataport descriptors. On gen4-5, read/write are substantially different and the read descriptors change between gen4 and gen4.x. On gen6, they unified layouts between read, write, and dataport. Then, on gen8, they added one bit to the message type field but left it reserved MBZ for read/write messages. This commit chooses to treat that as if they expanded the field everywhere and just didn't have enough enum values for read/write to bother with the extra bit. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00

1 2 3 4 5 ...

769 commits