Commit graph

1000 commits

Author SHA1 Message Date
Jason Ekstrand
c02c3ff612 intel/nir: Add a common nir comparison -> cmod helper
We already had one in the vec4 code, we just had move it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-03 00:35:48 +00:00
Eric Engestrom
178811d8f6 meson: drop unused dep_{thread,dl}
Unused as of last commit.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2019-08-03 00:08:37 +00:00
Eric Engestrom
d2d85b950d meson: replace libmesa_util with idep_mesautil
This automates the include_directories and dependencies tracking so that
all users of libmesa_util don't need to add them manually.

Next commit will remove the ones that were only added for that reason.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2019-08-03 00:08:37 +00:00
Francisco Jerez
54fbc625ea intel/ir: Fix CFG corruption in opt_predicated_break().
Specifically the optimization of a conditional BREAK + WHILE sequence
into a conditional WHILE seems pretty broken.  The list of successors
of "earlier_block" (where the conditional BREAK was found) is emptied
and then re-created with the same edges for no apparent reason.  On
top of that the list of predecessors of the block immediately after
the WHILE loop is emptied, but only one of the original edges will be
added back, which means that potentially several blocks that still
have it on their list of successors won't be on its list of
predecessors anymore, causing all sorts of hilarity due to the
inconsistency in the control flow graph.

The solution is to remove the code that's removing valid edges from
the CFG.  cfg_t::remove_block() will already clean up after itself.
The assert in bblock_t::combine_with() also needs to be removed since
we will be merging a block with multiple children into the first one
of them.

Found the issue on a hardware enabling branch originally, but
apparently somebody reproduced the same problem independently on
master in the meantime.

Fixes: d13bcdb3a9 ("i965/fs: Extend predicated break pass to predicate WHILE.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111009
Cc: jiradet.jd@gmail.com
Cc: Sergii Romantsov <sergii.romantsov@globallogic.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Paul Chelombitko <qamonstergl@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-01 16:56:48 -07:00
Mark Janes
086c486a75 intel/device: rename gen_get_device_info
Rename the original device info initialization routine so callers
don't mistakenly call the wrong one:

  gen_get_device_info_from_fd:

      Queries kernel for full device info, including topology
      details.

  gen_get_device_info_from_pci_id:

      Partially initializes device info based on PCI ID lookup, when
      the kernel is not available.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:56 -07:00
Timothy Arceri
2afedfaf9a iris: add support for gl_ClipVertex in tess eval shaders
Required for OpenGL compat support.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-01 16:12:37 -07:00
Timothy Arceri
00b5bf2d72 iris: add support for gl_ClipVertex in geometry shaders
This will enable us to support the OpenGL compat profile.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-01 16:12:27 -07:00
Jason Ekstrand
b539157504 intel/vec4: Drop all of the 64-bit varying code
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 18:14:09 -05:00
Jason Ekstrand
d03ec807a4 intel/fs: Drop all of the 64-bit varying code
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 18:14:09 -05:00
Jason Ekstrand
942c759059 intel: Use NIR to lower 64-bit varying access
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 18:14:09 -05:00
Eric Engestrom
abc226cf41 tree-wide: replace MAYBE_UNUSED with ASSERTED
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Jason Ekstrand
8fd2f2c276 intel/fs: Implement quad_swap_horizontal with a swizzle on gen7
This fixes dEQP-VK.subgroups.quad.compute.subgroupquadswaphorizontal_*
on all gen7 platforms.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-30 22:38:19 +00:00
Jason Ekstrand
499d760c6e intel/fs: Use ALIGN16 instructions for all derivatives on gen <= 7
The issue here was discovered by a set of Vulkan CTS tests:

    dEQP-VK.glsl.derivate.*.dynamic_*

These tests use ballot ops to construct a branch condition that takes
the same path for each 2x2 quad but may not be uniform across the whole
subgroup.  They then tests that derivatives work and give the correct
value even when executed inside such a branch.  Because the derivative
isn't executed in uniform control-flow and the values coming into the
derivative aren't smooth (or worse, linear), they nicely catch bugs that
aren't uncovered by simpler derivative tests.

Unfortunately, these tests require Vulkan and the equivalent GL test
would require the GL_ARB_shader_ballot extension which requires int64.
Because the requirements for these tests are so high, it's not easy to
test on older hardware and the bug is only proven to exist on gen7;
gen4-6 are a conjecture.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-30 22:38:19 +00:00
Matt Turner
46a3ea06be i965/fs: Print the scheduler mode.
Line wrap some awfully long lines while we are here.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-07-30 14:35:43 -07:00
Matt Turner
dabb5d4bee i965/fs: Add a shader_stats struct.
It'll grow further, and we'd like to avoid adding an additional
parameter to fs_generator() for each new piece of data.

v2 (idr): Rebase on 17 months.  Track a visitor instead of a cfg.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-30 14:35:43 -07:00
Jason Ekstrand
4bb6e6817e intel: Use a system value for gl_FragCoord
It's kind-of an anomaly that the Intel drivers are still treating
gl_FragCoord as an input.  It also makes zero sense because we have to
special-case it in the back-end.

Because ANV is the only user of nir_lower_wpos_center, we go ahead and
just update it to look for nir_intrinsic_load_frag_coord as part of this
patch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-29 23:30:26 +00:00
Jason Ekstrand
e401303597 intel/fs: Remove calculate_urb_setup from fs_visitor
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-29 23:30:26 +00:00
Daniel Schürmann
e272fdd508 nir,intel: lower if (cond) demote() to new intrinsic demote_if(cond)
This will effectively enable the optimization in anv.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-24 13:02:18 -05:00
Kenneth Graunke
517005b4cf i965: Use NIR to lower legacy userclipping.
This allows us to drop legacy userclip plane handling in both the vec4
and FS backends, and simplifies a few interfaces.

v2 (Jason Ekstrand):
 - Move brw_nir_lower_legacy_clipping to brw_nir_uniforms.cpp because
   it's i965-specific.
 - Handle adding the params in brw_nir_lower_legacy_clipping
 - Call brw_nir_lower_legacy_clipping from brw_codegen_vs_prog

Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-24 18:00:13 +00:00
Jason Ekstrand
2a236c76f8 intel/compiler: Allow for required subgroup sizes
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-24 12:55:40 -05:00
Jason Ekstrand
4397eb91c1 intel/compiler: Allow for varying subgroup sizes
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-24 12:55:40 -05:00
Jason Ekstrand
c84b8eeeac intel/compiler: Be more conservative about subgroup sizes in GL
The rules for gl_SubgroupSize in Vulkan require that it be a constant
that can be queried through the API.  However, all GL requires is that
it's a uniform.  Instead of always claiming that the subgroup size in
the shader is 32 in GL like we have to do for Vulkan, claim 8 for
geometry stages, the maximum for fragment shaders, and the actual size
for compute.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-24 12:55:40 -05:00
Jason Ekstrand
1981460af2 intel/compiler: Lower gl_SubgroupSize in postprocess_nir
Instead of lowering the subgroup size so early, wait until we have more
information.  In particular, we're going to want different subgroup
sizes from different stages depending on the API.  We also defer
lowering of subgroup masks because the ge/gt masks require the subgroup
size to generate a subgroup mask.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-24 12:55:40 -05:00
Jason Ekstrand
f62227f2b7 intel/nir: Make brw_nir_apply_sampler_key more generic
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-24 12:55:40 -05:00
Andrii Simiklit
fa2fc68de1 intel/compiler: don't use a keyword struct for a class fs_reg
warning: struct 'fs_reg' was previously declared as a class
Fixes: e64be391 ("intel/compiler: generalize the combine constants pass")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
2019-07-24 13:26:42 +00:00
Jason Ekstrand
fa63fad333 intel/fs: Stop stack allocating large arrays
Normally, we haven't worried too much about stack sizes as Linux tends
to be fairly friendly towards large stacks.  However, when running DXVK
apps under wine, we're suddenly subject to Windows' more stringent stack
limitations and can run out of space more easily.  In particular, some
of the shaders in Elite Dangerous: Horizons have quite a few registers
and the arrays in split_virtual_grfs are large enough to blow a 1 MiB
stack leading to crashes during shader compilation.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108662
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2019-07-22 16:16:39 -05:00
Caio Marcelo de Oliveira Filho
0345aeeb40 intel/compiler: Use nir_opt_conditional_discard
anv vkpipeline-db results for SKL:

total instructions in shared programs: 3622461 -> 3611281 (-0.31%)
instructions in affected programs: 396452 -> 385272 (-2.82%)
helped: 2062
HURT: 1

total cycles in shared programs: 1458144669 -> 1458105320 (<.01%)
cycles in affected programs: 4171830 -> 4132481 (-0.94%)
helped: 1874
HURT: 180

total loops in shared programs: 2437 -> 2437 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total spills in shared programs: 8745 -> 8748 (0.03%)
spills in affected programs: 8 -> 11 (37.50%)
helped: 1
HURT: 1

total fills in shared programs: 23392 -> 23395 (0.01%)
fills in affected programs: 8 -> 11 (37.50%)
helped: 1
HURT: 1

LOST:   0
GAINED: 1

No changes to shader-db on i965 or iris.  The glsl compiler already
does a similar optimization.

Improvement suggested by Daniel Schürmann.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-22 09:33:48 -07:00
Jason Ekstrand
7ceec21b76 intel/fs: Use a strided MOV instead of a conversion for load_* destinations
In many cases, the compiler can just copy-prop the strided MOV whereas
the conversion is a bit trickier.  This cuts 5% of the instructions off
of one particular Vulkan CTS test which does lots of load_ssbo.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-17 18:44:35 +00:00
Jason Ekstrand
68a4c796d5 intel/fs: Properly stride NULL replacement regs in DCE
This fixes some validation errors generated by certain D->W conversions
but is likely not a full solution.  Calculating an actual register
stride is a far more complex problem in general and should probably be
handled by the brw_fs_generator.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-17 18:44:35 +00:00
Jason Ekstrand
110669c85c st,i965: Stop looping on 64-bit lowering
Now that the 64-bit lowering passes do a complete lowering in one go, we
don't need to loop anymore.  We do, however, have to ensure that int64
lowering happens after double lowering because double lowering can
produce int64 ops.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
0ba508d7a3 nir,intel: Add support for lowering 64-bit nir_opt_extract_*
We need this when doing full software 64-bit emulation.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110309
Fixes: cbad201c2b "nir/algebraic: Add missing 64-bit extract_[iu]8..."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-07-15 16:08:37 -05:00
Jason Ekstrand
974fabe810 intel: Run the optimization loop before and after lowering int64
For bindless SSBO access, we have to do 64-bit address calculations.  On
ICL and above, we don't have 64-bit integer support so we have to lower
the address calculations to 32-bit arithmetic.  If we don't run the
optimization loop before lowering, we won't fold any of the address
chain calculations before lowering 64-bit arithmetic and they aren't
really foldable afterwards.  This cuts the size of the generated code in
the compute shader in dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 by
around 30%.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-13 02:59:28 +00:00
Andres Gomez
f4d2be03b1 intel/compiler: remove abandoned comments
c8665005: ("intel/compiler: Don't always require precise lowering of flrp")
forgot to remove some comments that didn't apply any more after the
change.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrnd.net>
2019-07-12 16:15:20 +00:00
Ian Romanick
1259f6d802 nir: intel/vec4: Add flag to disable some algebraic optimizations
A couple patches later in this series use the flag to avoid a few
thousand shader-db regresions on all vec4 platforms.

I'm not particularly enamored with the name of this flag.  However, I
suspect the Intel vec4 backend is the only backend that will benefit
from it.  Specifically, the cases where this helps are all cases where
we want to prevent nir_opt_algebraic from rearranging instructions to
create 3-source instructions, such as ffma and flrp, with additional
immediate value or uniform sources.

The earlier commit "intel/vec4: Try to emit a single load for multiple
3-src instruction operands" solves most of the problems caused by
additional immediate values, but the restrictions on register strides
that cause problems for uniforms and shader inputs persist.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
3a1fdca5ad intel/vec4: Try to emit immediate sources for MOV
Per the comment in vec4_visitor::nir_emit_load_const, further
improvement is possible in this area.  That case would be more
complicated as I think we'd want to check that all users of the
nir_load_const_instr result intended to use the value as float.

No shader-db changes on any Gen8+ platform as these platforms do not use
the vec4 backend.

v2: Massive rebase on eeebeb211f ("intel/vec4: Try emitting non-scalar
immediates").  This commit is about twice as helpful since b04beaf41d
("intel/vec4: Try both sources as candidates for being immediates").

Haswell and Ivy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 13478598 -> 13474068 (-0.03%)
instructions in affected programs: 589452 -> 584922 (-0.77%)
helped: 2773
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 1.63 x̃: 1
helped stats (rel) min: 0.16% max: 5.66% x̄: 0.96% x̃: 0.83%
95% mean confidence interval for instructions value: -1.67 -1.60
95% mean confidence interval for instructions %-change: -0.98% -0.94%
Instructions are helped.

total cycles in shared programs: 376386916 -> 376369392 (<.01%)
cycles in affected programs: 16871628 -> 16854104 (-0.10%)
helped: 2293
HURT: 523
helped stats (abs) min: 2 max: 812 x̄: 13.80 x̃: 2
helped stats (rel) min: <.01% max: 10.18% x̄: 1.02% x̃: 0.36%
HURT stats (abs)   min: 2 max: 316 x̄: 26.99 x̃: 14
HURT stats (rel)   min: <.01% max: 19.34% x̄: 2.15% x̃: 1.43%
95% mean confidence interval for cycles value: -7.87 -4.58
95% mean confidence interval for cycles %-change: -0.52% -0.34%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10860328 -> 10857675 (-0.02%)
instructions in affected programs: 335907 -> 333254 (-0.79%)
helped: 1639
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.62 x̃: 1
helped stats (rel) min: 0.10% max: 5.26% x̄: 0.86% x̃: 0.70%
95% mean confidence interval for instructions value: -1.67 -1.57
95% mean confidence interval for instructions %-change: -0.89% -0.84%
Instructions are helped.

total cycles in shared programs: 153942720 -> 153934120 (<.01%)
cycles in affected programs: 5604818 -> 5596218 (-0.15%)
helped: 1494
HURT: 97
helped stats (abs) min: 2 max: 256 x̄: 7.84 x̃: 2
helped stats (rel) min: 0.01% max: 6.62% x̄: 0.35% x̃: 0.18%
HURT stats (abs)   min: 2 max: 160 x̄: 32.02 x̃: 20
HURT stats (rel)   min: 0.02% max: 3.37% x̄: 0.88% x̃: 0.56%
95% mean confidence interval for cycles value: -6.45 -4.36
95% mean confidence interval for cycles %-change: -0.32% -0.23%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8139378 -> 8137267 (-0.03%)
instructions in affected programs: 265616 -> 263505 (-0.79%)
helped: 1148
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.84 x̃: 1
helped stats (rel) min: 0.22% max: 4.76% x̄: 0.87% x̃: 0.62%
95% mean confidence interval for instructions value: -1.90 -1.78
95% mean confidence interval for instructions %-change: -0.90% -0.83%
Instructions are helped.

total cycles in shared programs: 188541756 -> 188537540 (<.01%)
cycles in affected programs: 9807004 -> 9802788 (-0.04%)
helped: 1143
HURT: 4
helped stats (abs) min: 2 max: 10 x̄: 3.70 x̃: 2
helped stats (rel) min: <.01% max: 3.01% x̄: 0.13% x̃: 0.06%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.18% max: 0.18% x̄: 0.18% x̃: 0.18%
95% mean confidence interval for cycles value: -3.80 -3.55
95% mean confidence interval for cycles %-change: -0.14% -0.12%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
acd7796a07 intel/vec4: Try to emit a VF source in try_immediate_source
This commit is also a pre-requisite for the next commit.

No shader-db changes on any Gen8+ platform as these platforms do not use
the vec4 backend.

v2: Massive rebase on eeebeb211f ("intel/vec4: Try emitting non-scalar
immediates").  This change is a lot less helpful since that commit
landed (previously helped 1934 shaders on HSW) because, apparently, a
lot of the cases helped by that commit were things like vector loads of
{ 1.0, 1.0, 1.0 } that were also helped by this commit.

Haswell
total instructions in shared programs: 13480095 -> 13478598 (-0.01%)
instructions in affected programs: 229534 -> 228037 (-0.65%)
helped: 1006
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 1.49 x̃: 1
helped stats (rel) min: 0.04% max: 3.45% x̄: 1.11% x̃: 1.09%
95% mean confidence interval for instructions value: -1.54 -1.43
95% mean confidence interval for instructions %-change: -1.15% -1.07%
Instructions are helped.

total cycles in shared programs: 376385734 -> 376386916 (<.01%)
cycles in affected programs: 14101380 -> 14102562 (<.01%)
helped: 941
HURT: 56
helped stats (abs) min: 2 max: 322 x̄: 5.62 x̃: 2
helped stats (rel) min: <.01% max: 7.74% x̄: 0.51% x̃: 0.42%
HURT stats (abs)   min: 2 max: 618 x̄: 115.50 x̃: 32
HURT stats (rel)   min: 0.03% max: 4.62% x̄: 0.83% x̃: 0.44%
95% mean confidence interval for cycles value: -2.06 4.43
95% mean confidence interval for cycles %-change: -0.47% -0.39%
Inconclusive result (value mean confidence interval includes 0).

Ivy Bridge
total instructions in shared programs: 12048004 -> 12046589 (-0.01%)
instructions in affected programs: 217072 -> 215657 (-0.65%)
helped: 934
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 1.51 x̃: 1
helped stats (rel) min: 0.04% max: 3.45% x̄: 1.14% x̃: 1.11%
95% mean confidence interval for instructions value: -1.57 -1.46
95% mean confidence interval for instructions %-change: -1.18% -1.10%
Instructions are helped.

total cycles in shared programs: 180285854 -> 180287608 (<.01%)
cycles in affected programs: 14103824 -> 14105578 (0.01%)
helped: 871
HURT: 53
helped stats (abs) min: 2 max: 322 x̄: 5.51 x̃: 2
helped stats (rel) min: <.01% max: 7.67% x̄: 0.50% x̃: 0.42%
HURT stats (abs)   min: 2 max: 618 x̄: 123.66 x̃: 32
HURT stats (rel)   min: 0.03% max: 4.47% x̄: 0.92% x̃: 0.46%
95% mean confidence interval for cycles value: -1.60 5.39
95% mean confidence interval for cycles %-change: -0.46% -0.37%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10861227 -> 10860328 (<.01%)
instructions in affected programs: 92969 -> 92070 (-0.97%)
helped: 624
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 1.44 x̃: 1
helped stats (rel) min: 0.11% max: 3.45% x̄: 1.05% x̃: 0.95%
95% mean confidence interval for instructions value: -1.52 -1.36
95% mean confidence interval for instructions %-change: -1.09% -1.01%
Instructions are helped.

total cycles in shared programs: 153944316 -> 153942720 (<.01%)
cycles in affected programs: 1640956 -> 1639360 (-0.10%)
helped: 601
HURT: 15
helped stats (abs) min: 2 max: 120 x̄: 3.56 x̃: 2
helped stats (rel) min: 0.02% max: 6.33% x̄: 0.18% x̃: 0.08%
HURT stats (abs)   min: 2 max: 72 x̄: 36.13 x̃: 36
HURT stats (rel)   min: 0.05% max: 3.84% x̄: 1.95% x̃: 2.00%
95% mean confidence interval for cycles value: -3.44 -1.74
95% mean confidence interval for cycles %-change: -0.18% -0.09%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8139924 -> 8139378 (<.01%)
instructions in affected programs: 69776 -> 69230 (-0.78%)
helped: 322
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 1.70 x̃: 1
helped stats (rel) min: 0.27% max: 3.23% x̄: 0.79% x̃: 0.54%
95% mean confidence interval for instructions value: -1.88 -1.51
95% mean confidence interval for instructions %-change: -0.85% -0.72%
Instructions are helped.

total cycles in shared programs: 188542864 -> 188541756 (<.01%)
cycles in affected programs: 3031532 -> 3030424 (-0.04%)
helped: 320
HURT: 0
helped stats (abs) min: 2 max: 20 x̄: 3.46 x̃: 2
helped stats (rel) min: <.01% max: 0.69% x̄: 0.06% x̃: 0.06%
95% mean confidence interval for cycles value: -3.85 -3.07
95% mean confidence interval for cycles %-change: -0.06% -0.05%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
365b45d571 intel/vec4: Try to emit a single load for multiple 3-src instruction operands
If a 3-source instruction uses immediate values 1.0 and -1.0, just load
1.0 into a register.  Use the negation source modifier to get -1.0.
This has trivial impact now, but it prevents a few thousand regressions
on vec4 platforms with "nir/algebraic: Recognize open-coded flrp(-1, 1,
a) and flrp(1, -1, a)"

All Gen6 and Gen7 platforms had similar results. (Haswell shown)
total instructions in shared programs: 13487412 -> 13487406 (<.01%)
instructions in affected programs: 541 -> 535 (-1.11%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.36% max: 2.08% x̄: 1.65% x̃: 1.80%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.33% -0.97%
Instructions are helped.

total cycles in shared programs: 376402564 -> 376402500 (<.01%)
cycles in affected programs: 10348 -> 10284 (-0.62%)
helped: 10
HURT: 1
helped stats (abs) min: 2 max: 26 x̄: 7.00 x̃: 2
helped stats (rel) min: 0.13% max: 2.05% x̄: 0.89% x̃: 0.79%
HURT stats (abs)   min: 6 max: 6 x̄: 6.00 x̃: 6
HURT stats (rel)   min: 0.29% max: 0.29% x̄: 0.29% x̃: 0.29%
95% mean confidence interval for cycles value: -11.72 0.08
95% mean confidence interval for cycles %-change: -1.20% -0.36%
Inconclusive result (value mean confidence interval includes 0).

No shader-db changes on any other Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
6f6bc842f6 intel/vec4: Refactor operand fixing for ffma and flrp
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Caio Marcelo de Oliveira Filho
b390ff3517 intel/fs: Add support for SLM fence in Gen11
Gen11 SLM is not on L3 anymore, so now the hardware has two separate
fences.  Add a way to control which fence types to use.

At this time, we don't have enough information in NIR to control the
visibility of the memory being fenced, so for now be conservative and
assume that fences will need a stall.  With more information later
we'll be able to reduce those.

Fixes Vulkan CTS tests in ICL:

    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_nonlocal.workgroup.guard_local.buffer.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.buffer.guard_nonlocal.workgroup.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.image.guard_nonlocal.workgroup.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.buffer.guard_nonlocal.workgroup.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.image.guard_nonlocal.workgroup.comp

The whole set of supported tests in dEQP-VK.memory_model.* group
should be passing in ICL now.

v2: Pass BTI around instead of having an enum.  (Jason)
    Emit two SHADER_OPCODE_MEMORY_FENCE instead of one that gets
    transformed into two.  (Jason)
    List tests fixed.  (Lionel)

v3: For clarity, split the decision of which fences to emit from the
    emission code.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-11 08:29:32 -07:00
Jason Ekstrand
14781e2122 intel/compiler: Add a "base class" for program keys
Right now, all keys have two things in common: a program string ID and a
sampler_prog_key_data.  I'd like to add another thing or two and need a
place to put it.  This commit adds a new brw_base_prog_key struct which
contains those two common bits.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-10 19:35:55 +00:00
Ian Romanick
dd2dc7e707 intel/vec4: Delete vec4_visitor::emit_lrp
Effectivley unused since dd7135d55d ("intel/compiler: Use the flrp
lowering pass for all stages on Gen4 and Gen5").  I had intended to
remove this code as part of that series, but I forgot.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-08 11:30:11 -07:00
Ian Romanick
47c2aa5b48 intel/vec4: Reswizzle VF immediates too
Previously, an instruction like

mul(8) vgrf29.xy:F, vgrf25.yxxx:F, [-1F, 1F, 0F, 0F]

would get rewritten as

mul(8) vgrf0.yz:F, vgrf25.yyxx:F, [-1F, 1F, 0F, 0F]

The latter does not produce the correct result.  The VF immediate in the
second should be either [-1F, -1F, 1F, 1F] or [0F, -1F, 1F, 0F].  This
commit produces the former.

Fixes: 1ee1d8ab46 ("i965/vec4: Reswizzle sources when necessary.")
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-08 11:30:10 -07:00
Caio Marcelo de Oliveira Filho
45f5db5a84 intel/fs: Implement "demote to helper invocation"
The "demote" intrinsic works like "discard" but don't change the
control flow, allowing derivative operations to work.  This is the
semantics of D3D discard.

The "is_helper_invocation" intrinsic will return true for helper
invocations -- both the ones that started as helpers and the ones that
where demoted.  This is needed to avoid changing the behavior of
gl_HelperInvocation which is an input (so not expected to change
during shader execution).

v2: Emit the discard jump and comment why it is safe.  (Jason)
    Rework the is_helper_invocation() that was stomping f0.1.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-08 08:57:25 -07:00
Connor Abbott
6b28808b22 intel/nir: Extract add_const_offset_to_base
Pretty much every driver using nir_lower_io_to_temporaries followed by
nir_lower_io is going to want this. In particular, radv and radeonsi in
the next commits.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:14:53 +02:00
Jason Ekstrand
fa869f45c8 intel/fs: Use nir_lower_interpolation on gen11+
On gen11, the removed the PLN instruction so we have to emit a pile of
MAD to emulate it.  We may as well do that in NIR so we can optimize and
later schedule it.

Shader-db results on Ice Lake:

    total instructions in shared programs: 17145644 -> 16556440 (-3.44%)
    instructions in affected programs: 11507454 -> 10918250 (-5.12%)
    helped: 35763
    HURT: 42085
    helped stats (abs) min: 1 max: 140 x̄: 19.09 x̃: 18
    helped stats (rel) min: 0.04% max: 37.93% x̄: 15.40% x̃: 14.49%
    HURT stats (abs)   min: 1 max: 248 x̄: 2.22 x̃: 2
    HURT stats (rel)   min: 0.05% max: 50.00% x̄: 5.00% x̃: 2.47%
    95% mean confidence interval for instructions value: -7.67 -7.47
    95% mean confidence interval for instructions %-change: -4.46% -4.29%
    Instructions are helped.

    total loops in shared programs: 4370 -> 4370 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 360624645 -> 368220857 (2.11%)
    cycles in affected programs: 269631244 -> 277227456 (2.82%)
    helped: 15583
    HURT: 65874
    helped stats (abs) min: 1 max: 28561 x̄: 78.45 x̃: 32
    helped stats (rel) min: <.01% max: 67.81% x̄: 5.38% x̃: 2.44%
    HURT stats (abs)   min: 1 max: 238638 x̄: 133.87 x̃: 20
    HURT stats (rel)   min: <.01% max: 306.25% x̄: 5.81% x̃: 3.97%
    95% mean confidence interval for cycles value: 67.42 119.09
    95% mean confidence interval for cycles %-change: 3.61% 3.73%
    Cycles are HURT.

    total spills in shared programs: 8943 -> 8981 (0.42%)
    spills in affected programs: 1925 -> 1963 (1.97%)
    helped: 44
    HURT: 14

    total fills in shared programs: 21815 -> 21925 (0.50%)
    fills in affected programs: 3511 -> 3621 (3.13%)
    helped: 41
    HURT: 18

    LOST:   70
    GAINED: 14

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Jason Ekstrand
2b79a9e5a5 intel/fs: Implement nir_intrinsic_load_fs_input_interp_deltas
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Jason Ekstrand
8e7d066682 intel/fs: Actually implement the load_barycentric intrinsics
If they never get used, dead code should clean them up.  Also, we rework
the at_offset and at_sample intrinsics so they return a proper vec2
instead of returning things in PLN layout.  Fortunately, copy-prop is
pretty good at cleaning this up and it doesn't result in any actual
extra MOVs.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Sagar Ghuge
1e92e83856 intel/compiler: Emit ROR and ROL instruction
v2: Reorder patch (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Sagar Ghuge
83fdec0f0d intel/compiler: Enable the emission of ROR/ROL instructions
v2: 1) Drop changes for vec4 backend as on Gen11+ we don't support
       align16 mode (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Lionel Landwerlin
5847de6e9a intel/compiler: don't use byte operands for src1 on ICL
The simulator complains about using byte operands, we also have
documentation telling us.

Note that add operations on bytes seems to work fine on HW (like ADD).
Using dwords operands with CMP & SEL fixes the following tests :

   dEQP-VK.spirv_assembly.type.vec*.i8.*

v2: Drop the GLK changes (Matt)
    Add validator tests (Matt)

v3: Drop GLK ref (Matt)
    Don't mix float/integer in MAD (Matt)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>
BSpec: 3017
Cc: <mesa-stable@lists.freedesktop.org>
2019-06-29 12:56:09 +00:00