Commit graph

66 commits

Author SHA1 Message Date
Karol Herbst
e5899c1e88 nir: rename nir_op_fne to nir_op_fneu
It was always fneu but naming it fne causes confusion from time to time. So
lets rename it. Later we also want to add other unordered and fne, this is
a smaller preparation for that.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6377>
2020-08-21 17:26:21 +00:00
Marcin Ślusarz
cb19fe24d3 intel/vec4: fix out of bounds read
NIR_MAX_VEC_COMPONENTS was bumped from 4 to 16 in a8ec4082
(2019.03.09, merged 2019.12.21)

float[4] array was added in acd7796a
(2019.06.11, merged 2019.07.11)

Found by Coverity.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3014

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Fixes: a8ec4082a4 ("nir+vtn: vec8+vec16 support")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6067>
2020-07-30 10:41:00 +00:00
Boris Brezillon
345b5847b4 nir: Replace the scoped_memory barrier by a scoped_barrier
SPIRV OpControlBarrier can have both a memory and a control barrier
which some hardware can handle with a single instruction. Let's
turn the scoped_memory_barrier into a scoped barrier which can embed
both barrier types. Note that control-only or memory-only barriers can
be supported through this new intrinsic by passing NIR_SCOPE_NONE to the
unused barrier type.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4900>
2020-06-03 07:39:52 +00:00
Caio Marcelo de Oliveira Filho
f858fa26b4 intel/fs,vec4: Pull stall logic for memory fences up into the IR
Instead of emitting the stall MOV "inside" the
SHADER_OPCODE_MEMORY_FENCE generation, use the scheduling fences when
creating the IR.

For IvyBridge, every (data cache) fence is accompained by a render
cache fence, that now is explicit in the IR, two
SHADER_OPCODE_MEMORY_FENCEs are emitted (with different SFIDs).

Because Begin and End interlock intrinsics are effectively memory
barriers, move its handling alongside the other memory barrier
intrinsics.  The SHADER_OPCODE_INTERLOCK is still used to distinguish
if we are going to use a SENDC (for Begin) or regular SEND (for End).

This change is a preparation to allow emitting both SENDs in Gen11+
before we can stall on them.

Shader-db results for IVB (i965):

    total instructions in shared programs: 11971190 -> 11971200 (<.01%)
    instructions in affected programs: 11482 -> 11492 (0.09%)
    helped: 0
    HURT: 8
    HURT stats (abs)   min: 1 max: 3 x̄: 1.25 x̃: 1
    HURT stats (rel)   min: 0.03% max: 0.50% x̄: 0.14% x̃: 0.10%
    95% mean confidence interval for instructions value: 0.66 1.84
    95% mean confidence interval for instructions %-change: 0.01% 0.27%
    Instructions are HURT.

  Unlike the previous code, that used the `mov g1 g2` trick to force
  both `g1` and `g2` to stall, the scheduling fence will generate `mov
  null g1` and `mov null g2`.  During review it was decided it was not
  worth keeping the special codepath for the small effect will have.

Shader-db results for HSW (i965), BDW and SKL don't have a change
on instruction count, but do report changes in cycles count, showing
SKL results below

    total cycles in shared programs: 341738444 -> 341710570 (<.01%)
    cycles in affected programs: 7240002 -> 7212128 (-0.38%)
    helped: 46
    HURT: 5
    helped stats (abs) min: 14 max: 1940 x̄: 676.22 x̃: 154
    helped stats (rel) min: <.01% max: 2.62% x̄: 1.28% x̃: 0.95%
    HURT stats (abs)   min: 2 max: 1768 x̄: 646.40 x̃: 362
    HURT stats (rel)   min: <.01% max: 0.83% x̄: 0.28% x̃: 0.08%
    95% mean confidence interval for cycles value: -777.71 -315.38
    95% mean confidence interval for cycles %-change: -1.42% -0.83%
    Cycles are helped.

  This seems to be the effect of allocating two registers separatedly
  instead of a single one with size 2, which causes different register
  allocation, affecting the cycle estimates.

while ICL also has not change on instruction count but report changes
negative changes in cycles

    total cycles in shared programs: 352665369 -> 352707484 (0.01%)
    cycles in affected programs: 9608288 -> 9650403 (0.44%)
    helped: 4
    HURT: 104
    helped stats (abs) min: 24 max: 128 x̄: 88.50 x̃: 101
    helped stats (rel) min: <.01% max: 0.85% x̄: 0.46% x̃: 0.49%
    HURT stats (abs)   min: 2 max: 2016 x̄: 408.36 x̃: 48
    HURT stats (rel)   min: <.01% max: 3.31% x̄: 0.88% x̃: 0.45%
    95% mean confidence interval for cycles value: 256.67 523.24
    95% mean confidence interval for cycles %-change: 0.63% 1.03%
    Cycles are HURT.

  AFAICT this is the result of the case above.

Shader-db results for TGL have similar cycles result as ICL, but also
affect instructions

    total instructions in shared programs: 17690586 -> 17690597 (<.01%)
    instructions in affected programs: 64617 -> 64628 (0.02%)
    helped: 55
    HURT: 32
    helped stats (abs) min: 1 max: 16 x̄: 4.13 x̃: 3
    helped stats (rel) min: 0.05% max: 2.78% x̄: 0.86% x̃: 0.74%
    HURT stats (abs)   min: 1 max: 65 x̄: 7.44 x̃: 2
    HURT stats (rel)   min: 0.05% max: 4.58% x̄: 1.13% x̃: 0.69%
    95% mean confidence interval for instructions value: -2.03 2.28
    95% mean confidence interval for instructions %-change: -0.41% 0.15%
    Inconclusive result (value mean confidence interval includes 0).

  Now that more is done in the IR, more dependencies are visible and
  more SWSB annotations are emitted.  Mixed with different register
  allocation decisions like above, some shaders will see more `sync
  nops` while others able to avoid them.

  Most of the new `sync nops` are also redundant and could be dropped,
  which will be fixed in a separate change.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>
2020-04-29 07:17:27 +00:00
Kenneth Graunke
4459a70a6e intel/compiler: Delete abs/neg handling in fsign code
This should have gone away when removing source modifiers.  They won't
be set any longer, so this is simply dead code.

Fixes: b7c47c4f7c ("intel/compiler: Drop nir_lower_to_source_mods() and related handling.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4691>
2020-04-22 17:04:37 -07:00
Kenneth Graunke
b7c47c4f7c intel/compiler: Drop nir_lower_to_source_mods() and related handling.
I think we're unanimous in wanting to drop nir_lower_to_source_mods.
It's a bit of complexity to handle in the backend, but perhaps more
importantly, would be even more complexity to handle in nir_search.

And, it turns out that since we made other compiler improvements in the
last few years, they no longer appear to buy us anything of value.
Summarizing the results from shader-db from this patch:

 - Icelake (scalar mode)

   Instruction counts:

   - 411 helped, 598 hurt (out of 139,470 shaders)
   - 99.2% of shaders remain unaffected.  The average increase in
     instruction count in hurt programs is 1.78 instructions.
   - total instructions in shared programs: 17214951 -> 17215206 (<.01%)
   - instructions in affected programs: 1143879 -> 1144134 (0.02%)

   Cycles:

   - 1042 helped, 1357 hurt
   - total cycles in shared programs: 365613294 -> 365882263 (0.07%)
   - cycles in affected programs: 138155497 -> 138424466 (0.19%)

 - Haswell (both scalar and vector modes)

   Instruction counts:

   - 73 helped, 1680 hurt (out of 139,470 shaders)
   - 98.7% of shaders remain unaffected.  The average increase in
     instruction count in hurt programs is 1.9 instructions.
   - total instructions in shared programs: 14199527 -> 14202262 (0.02%)
   - instructions in affected programs: 446499 -> 449234 (0.61%)

   Cycles:

   - 5253 helped, 5559 hurt
   - total cycles in shared programs: 359996545 -> 360038731 (0.01%)
   - cycles in affected programs: 155897127 -> 155939313 (0.03%)

Given that ~99% of shader-db remains unaffected, and the affected
programs are hurt by about 1-2 instructions - which are all cheap
ALU instructions - this is unlikely to be measurable in terms of
any real performance impact that would affect users.

So, drop them and simplify the backend, and hopefully enable other
future simplifications in NIR.

Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4616>
2020-04-21 21:42:21 +00:00
Jason Ekstrand
7c43b8ce1b nir: Delete the fnoise opcodes
As of the previous commit, they are never used.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4624>
2020-04-21 06:16:13 +00:00
Tapani Pälli
da76dfb515 intel/vec4: fix valgrind errors with vf_values array
Fixes valgrind errors introduced since commit a8ec4082.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2346
Fixes: a8ec4082 ("nir+vtn: vec8+vec16 support")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3691>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3691>
2020-02-07 09:06:18 +00:00
Matt Turner
88a0523bd2 intel/compiler: Move Gen4/5 rounding to visitor
Gen4/5's rounding instructions operate differently than later Gens'.
They all return the floor of the input and the "Round-increment"
conditional modifier answers whether the result should be incremented by
1.0 to get the appropriate result for the operation (and thus its
behavior is determined by the round opcode; e.g., RNDZ vs RNDE).

Since this requires a second instruciton (a predicated ADD) that
consumes the result of the round instruction, the round instruction
cannot write its result directly to the (write-only) message registers.
By emitting the ADD in the generator, the backend thinks it's safe to
store the round's result directly to the message register file.

To avoid this, we move the emission of the ADD instruction to the NIR
translator so that the backend has the information it needs.

I suspect this also fixes code generated for RNDZ.SAT but since
Gen4/5 don't support GLSL 1.30 which adds the trunc() function, I
couldn't write a piglit test to confirm. My thinking is that if x=-0.5:

      sat(trunc(-0.5)) = 0.0

But on Gen4/5 where sat(trunc(x)) is implemented as

      rndz.r.f0  result, x             // result = floor(x)
                                       // set f0 if increment needed
      (+f0) add  result, result, 1.0   // fixup so result = trunc(x)

then putting saturate on both instructions will give the wrong result.

      floor(-0.5) = -1.0
      sat(floor(-0.5)) = 0.0
      // +1 increment would be needed since floor(-0.5) != trunc(-0.5)
      sat(sat(floor(-0.5)) + 1.0) = 1.0

Fixes: 6f394343b1 ("nir/algebraic: i2f(f2i()) -> trunc()")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2355
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3459>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3459>
2020-01-22 23:47:02 +00:00
Jason Ekstrand
ada49bae5e intel/vec4: Support scoped_memory_barrier
Fixes: 06aecb14c0 "anv: Implement VK_KHR_vulkan_memory_model"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13 17:23:46 +00:00
Caio Marcelo de Oliveira Filho
766fdeccf9 intel/vec4: Fix lowering of multiplication by 16-bit constant
Existing code was ignoring whether the type of the immediate source
was signed or not.  If the source was signed, it would ignore small
negative values but it also would wrongly accept values between
INT16_MAX and UINT16_MAX, causing the atual value to later be
reinterpreted as a negative number (under 16-bits).

Fixes tests/shaders/glsl-mul-const.shader_test in Piglit for older
platforms that don't support MUL with 32x32 types and use vec4.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-17 10:45:22 -08:00
Jason Ekstrand
24c0545b2d intel/vec4: Set brw_stage_prog_data::has_ubo_pull
In 0e4a75f917, Ken added a flag brw_stage_prog_data which indicates
whether any UBO pulls ever occur.  Unfortunately, he neglected to set
the bit in the vec4 back-end.  This was fine at the time because the
optimization was intended for iris which does not support gen7 and using
the vec4 back-end on Gen8+ requires an environment variable.  We want to
use this in Vulkan which does support Gen7 so we want the information
from the vec4 back-end as well as scalar.

Fixes: 0e4a75f917 "intel/compiler: Record whether any pull constant..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-30 16:05:57 +00:00
Ian Romanick
92252219d3 intel/vec4: Don't try both sources as immediates for DPH
DPH isn't actually commutative, so this doesn't work.  If the immediate
in src0 would be a VF candidate, we could do better. *shrug*

No shader-db changes on any Intel platform.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: b04beaf41d ("intel/vec4: Try both sources as candidates for being immediates")
2019-10-17 15:07:01 -07:00
Caio Marcelo de Oliveira Filho
f7d90c67c7 intel/compiler: Silence maybe-uninitialized warning in GCC 9.1.1
Compiler can't see that d is initialized.

    ../src/intel/compiler/brw_vec4_nir.cpp: In function ‘int brw::try_immediate_source(const nir_alu_instr*, brw::src_reg*, bool, const gen_device_info*)’:
    ../src/intel/compiler/brw_vec4_nir.cpp:984:12: warning: ‘d’ may be used uninitialized in this function [-Wmaybe-uninitialized]
      984 |          d = MAX2(-d, d);

Assert that we expect at least one component -- hence d going to be
set.  That by itself is not enough, so also zero initialize the
variable.

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-23 13:25:27 -07:00
Jason Ekstrand
021fa28163 intel/nir: Add a helper for getting BRW_AOP from an intrinsic
So many duplicated switch statements....

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-21 17:19:55 +00:00
Jason Ekstrand
c02c3ff612 intel/nir: Add a common nir comparison -> cmod helper
We already had one in the vec4 code, we just had move it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-03 00:35:48 +00:00
Jason Ekstrand
b539157504 intel/vec4: Drop all of the 64-bit varying code
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 18:14:09 -05:00
Eric Engestrom
abc226cf41 tree-wide: replace MAYBE_UNUSED with ASSERTED
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Ian Romanick
3a1fdca5ad intel/vec4: Try to emit immediate sources for MOV
Per the comment in vec4_visitor::nir_emit_load_const, further
improvement is possible in this area.  That case would be more
complicated as I think we'd want to check that all users of the
nir_load_const_instr result intended to use the value as float.

No shader-db changes on any Gen8+ platform as these platforms do not use
the vec4 backend.

v2: Massive rebase on eeebeb211f ("intel/vec4: Try emitting non-scalar
immediates").  This commit is about twice as helpful since b04beaf41d
("intel/vec4: Try both sources as candidates for being immediates").

Haswell and Ivy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 13478598 -> 13474068 (-0.03%)
instructions in affected programs: 589452 -> 584922 (-0.77%)
helped: 2773
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 1.63 x̃: 1
helped stats (rel) min: 0.16% max: 5.66% x̄: 0.96% x̃: 0.83%
95% mean confidence interval for instructions value: -1.67 -1.60
95% mean confidence interval for instructions %-change: -0.98% -0.94%
Instructions are helped.

total cycles in shared programs: 376386916 -> 376369392 (<.01%)
cycles in affected programs: 16871628 -> 16854104 (-0.10%)
helped: 2293
HURT: 523
helped stats (abs) min: 2 max: 812 x̄: 13.80 x̃: 2
helped stats (rel) min: <.01% max: 10.18% x̄: 1.02% x̃: 0.36%
HURT stats (abs)   min: 2 max: 316 x̄: 26.99 x̃: 14
HURT stats (rel)   min: <.01% max: 19.34% x̄: 2.15% x̃: 1.43%
95% mean confidence interval for cycles value: -7.87 -4.58
95% mean confidence interval for cycles %-change: -0.52% -0.34%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10860328 -> 10857675 (-0.02%)
instructions in affected programs: 335907 -> 333254 (-0.79%)
helped: 1639
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.62 x̃: 1
helped stats (rel) min: 0.10% max: 5.26% x̄: 0.86% x̃: 0.70%
95% mean confidence interval for instructions value: -1.67 -1.57
95% mean confidence interval for instructions %-change: -0.89% -0.84%
Instructions are helped.

total cycles in shared programs: 153942720 -> 153934120 (<.01%)
cycles in affected programs: 5604818 -> 5596218 (-0.15%)
helped: 1494
HURT: 97
helped stats (abs) min: 2 max: 256 x̄: 7.84 x̃: 2
helped stats (rel) min: 0.01% max: 6.62% x̄: 0.35% x̃: 0.18%
HURT stats (abs)   min: 2 max: 160 x̄: 32.02 x̃: 20
HURT stats (rel)   min: 0.02% max: 3.37% x̄: 0.88% x̃: 0.56%
95% mean confidence interval for cycles value: -6.45 -4.36
95% mean confidence interval for cycles %-change: -0.32% -0.23%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8139378 -> 8137267 (-0.03%)
instructions in affected programs: 265616 -> 263505 (-0.79%)
helped: 1148
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.84 x̃: 1
helped stats (rel) min: 0.22% max: 4.76% x̄: 0.87% x̃: 0.62%
95% mean confidence interval for instructions value: -1.90 -1.78
95% mean confidence interval for instructions %-change: -0.90% -0.83%
Instructions are helped.

total cycles in shared programs: 188541756 -> 188537540 (<.01%)
cycles in affected programs: 9807004 -> 9802788 (-0.04%)
helped: 1143
HURT: 4
helped stats (abs) min: 2 max: 10 x̄: 3.70 x̃: 2
helped stats (rel) min: <.01% max: 3.01% x̄: 0.13% x̃: 0.06%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.18% max: 0.18% x̄: 0.18% x̃: 0.18%
95% mean confidence interval for cycles value: -3.80 -3.55
95% mean confidence interval for cycles %-change: -0.14% -0.12%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
acd7796a07 intel/vec4: Try to emit a VF source in try_immediate_source
This commit is also a pre-requisite for the next commit.

No shader-db changes on any Gen8+ platform as these platforms do not use
the vec4 backend.

v2: Massive rebase on eeebeb211f ("intel/vec4: Try emitting non-scalar
immediates").  This change is a lot less helpful since that commit
landed (previously helped 1934 shaders on HSW) because, apparently, a
lot of the cases helped by that commit were things like vector loads of
{ 1.0, 1.0, 1.0 } that were also helped by this commit.

Haswell
total instructions in shared programs: 13480095 -> 13478598 (-0.01%)
instructions in affected programs: 229534 -> 228037 (-0.65%)
helped: 1006
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 1.49 x̃: 1
helped stats (rel) min: 0.04% max: 3.45% x̄: 1.11% x̃: 1.09%
95% mean confidence interval for instructions value: -1.54 -1.43
95% mean confidence interval for instructions %-change: -1.15% -1.07%
Instructions are helped.

total cycles in shared programs: 376385734 -> 376386916 (<.01%)
cycles in affected programs: 14101380 -> 14102562 (<.01%)
helped: 941
HURT: 56
helped stats (abs) min: 2 max: 322 x̄: 5.62 x̃: 2
helped stats (rel) min: <.01% max: 7.74% x̄: 0.51% x̃: 0.42%
HURT stats (abs)   min: 2 max: 618 x̄: 115.50 x̃: 32
HURT stats (rel)   min: 0.03% max: 4.62% x̄: 0.83% x̃: 0.44%
95% mean confidence interval for cycles value: -2.06 4.43
95% mean confidence interval for cycles %-change: -0.47% -0.39%
Inconclusive result (value mean confidence interval includes 0).

Ivy Bridge
total instructions in shared programs: 12048004 -> 12046589 (-0.01%)
instructions in affected programs: 217072 -> 215657 (-0.65%)
helped: 934
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 1.51 x̃: 1
helped stats (rel) min: 0.04% max: 3.45% x̄: 1.14% x̃: 1.11%
95% mean confidence interval for instructions value: -1.57 -1.46
95% mean confidence interval for instructions %-change: -1.18% -1.10%
Instructions are helped.

total cycles in shared programs: 180285854 -> 180287608 (<.01%)
cycles in affected programs: 14103824 -> 14105578 (0.01%)
helped: 871
HURT: 53
helped stats (abs) min: 2 max: 322 x̄: 5.51 x̃: 2
helped stats (rel) min: <.01% max: 7.67% x̄: 0.50% x̃: 0.42%
HURT stats (abs)   min: 2 max: 618 x̄: 123.66 x̃: 32
HURT stats (rel)   min: 0.03% max: 4.47% x̄: 0.92% x̃: 0.46%
95% mean confidence interval for cycles value: -1.60 5.39
95% mean confidence interval for cycles %-change: -0.46% -0.37%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10861227 -> 10860328 (<.01%)
instructions in affected programs: 92969 -> 92070 (-0.97%)
helped: 624
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 1.44 x̃: 1
helped stats (rel) min: 0.11% max: 3.45% x̄: 1.05% x̃: 0.95%
95% mean confidence interval for instructions value: -1.52 -1.36
95% mean confidence interval for instructions %-change: -1.09% -1.01%
Instructions are helped.

total cycles in shared programs: 153944316 -> 153942720 (<.01%)
cycles in affected programs: 1640956 -> 1639360 (-0.10%)
helped: 601
HURT: 15
helped stats (abs) min: 2 max: 120 x̄: 3.56 x̃: 2
helped stats (rel) min: 0.02% max: 6.33% x̄: 0.18% x̃: 0.08%
HURT stats (abs)   min: 2 max: 72 x̄: 36.13 x̃: 36
HURT stats (rel)   min: 0.05% max: 3.84% x̄: 1.95% x̃: 2.00%
95% mean confidence interval for cycles value: -3.44 -1.74
95% mean confidence interval for cycles %-change: -0.18% -0.09%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8139924 -> 8139378 (<.01%)
instructions in affected programs: 69776 -> 69230 (-0.78%)
helped: 322
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 1.70 x̃: 1
helped stats (rel) min: 0.27% max: 3.23% x̄: 0.79% x̃: 0.54%
95% mean confidence interval for instructions value: -1.88 -1.51
95% mean confidence interval for instructions %-change: -0.85% -0.72%
Instructions are helped.

total cycles in shared programs: 188542864 -> 188541756 (<.01%)
cycles in affected programs: 3031532 -> 3030424 (-0.04%)
helped: 320
HURT: 0
helped stats (abs) min: 2 max: 20 x̄: 3.46 x̃: 2
helped stats (rel) min: <.01% max: 0.69% x̄: 0.06% x̃: 0.06%
95% mean confidence interval for cycles value: -3.85 -3.07
95% mean confidence interval for cycles %-change: -0.06% -0.05%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
365b45d571 intel/vec4: Try to emit a single load for multiple 3-src instruction operands
If a 3-source instruction uses immediate values 1.0 and -1.0, just load
1.0 into a register.  Use the negation source modifier to get -1.0.
This has trivial impact now, but it prevents a few thousand regressions
on vec4 platforms with "nir/algebraic: Recognize open-coded flrp(-1, 1,
a) and flrp(1, -1, a)"

All Gen6 and Gen7 platforms had similar results. (Haswell shown)
total instructions in shared programs: 13487412 -> 13487406 (<.01%)
instructions in affected programs: 541 -> 535 (-1.11%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.36% max: 2.08% x̄: 1.65% x̃: 1.80%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.33% -0.97%
Instructions are helped.

total cycles in shared programs: 376402564 -> 376402500 (<.01%)
cycles in affected programs: 10348 -> 10284 (-0.62%)
helped: 10
HURT: 1
helped stats (abs) min: 2 max: 26 x̄: 7.00 x̃: 2
helped stats (rel) min: 0.13% max: 2.05% x̄: 0.89% x̃: 0.79%
HURT stats (abs)   min: 6 max: 6 x̄: 6.00 x̃: 6
HURT stats (rel)   min: 0.29% max: 0.29% x̄: 0.29% x̃: 0.29%
95% mean confidence interval for cycles value: -11.72 0.08
95% mean confidence interval for cycles %-change: -1.20% -0.36%
Inconclusive result (value mean confidence interval includes 0).

No shader-db changes on any other Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
6f6bc842f6 intel/vec4: Refactor operand fixing for ffma and flrp
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
dd2dc7e707 intel/vec4: Delete vec4_visitor::emit_lrp
Effectivley unused since dd7135d55d ("intel/compiler: Use the flrp
lowering pass for all stages on Gen4 and Gen5").  I had intended to
remove this code as part of that series, but I forgot.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-08 11:30:11 -07:00
Ian Romanick
b04beaf41d intel/vec4: Try both sources as candidates for being immediates
For some reason, when I first wrote try_immediate_source, I thought the
sources had already been ordered so that the immediate value was the
second source.  That's rubbish.  The generator assumes *neither* source
is immediate, and it relies on later copy/constant propagation passes to
do the reordering.

For this reason, the changes to try_immediate_source have to go to some
efforts to reorder the operands and tell the caller when it reordered
them.  The generator for comparison instructions uses this to determine
when the comparison needs to change (e.g., from GT to LT).

No changes on any Gen8 or later platform because those platforms do not
use the vec4 backend.

Haswell
total instructions in shared programs: 13484431 -> 13480500 (-0.03%)
instructions in affected programs: 441138 -> 437207 (-0.89%)
helped: 1883
HURT: 0
helped stats (abs) min: 1 max: 49 x̄: 2.09 x̃: 1
helped stats (rel) min: 0.07% max: 8.91% x̄: 1.10% x̃: 0.90%
95% mean confidence interval for instructions value: -2.19 -1.98
95% mean confidence interval for instructions %-change: -1.14% -1.06%
Instructions are helped.

total cycles in shared programs: 376420286 -> 376406400 (<.01%)
cycles in affected programs: 15995668 -> 15981782 (-0.09%)
helped: 1692
HURT: 219
helped stats (abs) min: 2 max: 764 x̄: 13.78 x̃: 4
helped stats (rel) min: <.01% max: 9.69% x̄: 0.69% x̃: 0.35%
HURT stats (abs)   min: 2 max: 516 x̄: 43.09 x̃: 22
HURT stats (rel)   min: 0.02% max: 12.09% x̄: 2.30% x̃: 1.13%
95% mean confidence interval for cycles value: -9.70 -4.83
95% mean confidence interval for cycles %-change: -0.42% -0.28%
Cycles are helped.

total spills in shared programs: 23166 -> 23158 (-0.03%)
spills in affected programs: 66 -> 58 (-12.12%)
helped: 2
HURT: 0

total fills in shared programs: 34592 -> 34580 (-0.03%)
fills in affected programs: 75 -> 63 (-16.00%)
helped: 2
HURT: 0

Ivy Bridge
total instructions in shared programs: 12051590 -> 12048513 (-0.03%)
instructions in affected programs: 355911 -> 352834 (-0.86%)
helped: 1481
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 2.08 x̃: 1
helped stats (rel) min: 0.07% max: 4.92% x̄: 1.08% x̃: 0.90%
95% mean confidence interval for instructions value: -2.17 -1.98
95% mean confidence interval for instructions %-change: -1.12% -1.04%
Instructions are helped.

total cycles in shared programs: 180319624 -> 180307642 (<.01%)
cycles in affected programs: 15591028 -> 15579046 (-0.08%)
helped: 1340
HURT: 174
helped stats (abs) min: 2 max: 764 x̄: 14.19 x̃: 2
helped stats (rel) min: <.01% max: 8.68% x̄: 0.64% x̃: 0.32%
HURT stats (abs)   min: 2 max: 518 x̄: 40.41 x̃: 14
HURT stats (rel)   min: 0.02% max: 8.37% x̄: 1.59% x̃: 0.67%
95% mean confidence interval for cycles value: -10.85 -4.97
95% mean confidence interval for cycles %-change: -0.45% -0.31%
Cycles are helped.

All Gen6 and earlier platforms had simlar results. (Sandy Bridge shown)
total instructions in shared programs: 10863159 -> 10861462 (-0.02%)
instructions in affected programs: 157839 -> 156142 (-1.08%)
helped: 715
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 2.37 x̃: 2
helped stats (rel) min: 0.23% max: 4.33% x̄: 1.07% x̃: 0.85%
95% mean confidence interval for instructions value: -2.53 -2.21
95% mean confidence interval for instructions %-change: -1.13% -1.02%
Instructions are helped.

total cycles in shared programs: 153957782 -> 153948778 (<.01%)
cycles in affected programs: 3171648 -> 3162644 (-0.28%)
helped: 696
HURT: 62
helped stats (abs) min: 2 max: 390 x̄: 15.72 x̃: 4
helped stats (rel) min: 0.02% max: 10.57% x̄: 0.57% x̃: 0.12%
HURT stats (abs)   min: 2 max: 300 x̄: 31.29 x̃: 2
HURT stats (rel)   min: 0.11% max: 7.23% x̄: 0.83% x̃: 0.34%
95% mean confidence interval for cycles value: -15.65 -8.11
95% mean confidence interval for cycles %-change: -0.56% -0.36%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-28 18:13:18 -07:00
Ian Romanick
379cf3bb87 intel/vec4: Try immediate sources for dot products too
No changes on any Gen8 or later platform because those platforms do not
use the vec4 backend.

All Haswell and earlier platforms has similar results. (Haswell shown)
total instructions in shared programs: 13484467 -> 13484431 (<.01%)
instructions in affected programs: 8540 -> 8504 (-0.42%)
helped: 33
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.09 x̃: 1
helped stats (rel) min: 0.31% max: 1.53% x̄: 0.49% x̃: 0.35%
95% mean confidence interval for instructions value: -1.19 -0.99
95% mean confidence interval for instructions %-change: -0.60% -0.38%
Instructions are helped.

total cycles in shared programs: 376420572 -> 376420286 (<.01%)
cycles in affected programs: 56260 -> 55974 (-0.51%)
helped: 26
HURT: 5
helped stats (abs) min: 2 max: 204 x̄: 11.85 x̃: 2
helped stats (rel) min: 0.11% max: 3.08% x̄: 0.39% x̃: 0.13%
HURT stats (abs)   min: 2 max: 6 x̄: 4.40 x̃: 6
HURT stats (rel)   min: 0.03% max: 0.35% x̄: 0.24% x̃: 0.35%
95% mean confidence interval for cycles value: -22.91 4.45
95% mean confidence interval for cycles %-change: -0.56% -0.02%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-28 17:16:16 -07:00
Ian Romanick
eeebeb211f intel/vec4: Try emitting non-scalar immediates
Sometimes an instruction has a vector as a source, but all of the
components have the same value.  For example,

    vec3 32 ssa_16 = load_const (1.0, 1.0, 1.0)
    ...
    vec3 32 ssa_82 = fadd ssa_16, -ssa_81.xyz

No changes on any Gen8 or later platform because those platforms do not
use the vec4 backend.

Haswell
total instructions in shared programs: 13487811 -> 13484467 (-0.02%)
instructions in affected programs: 421981 -> 418637 (-0.79%)
helped: 1859
HURT: 0
helped stats (abs) min: 1 max: 15 x̄: 1.80 x̃: 1
helped stats (rel) min: 0.04% max: 9.80% x̄: 1.04% x̃: 0.84%
95% mean confidence interval for instructions value: -1.85 -1.74
95% mean confidence interval for instructions %-change: -1.07% -1.00%
Instructions are helped.

total cycles in shared programs: 376423252 -> 376420572 (<.01%)
cycles in affected programs: 14800970 -> 14798290 (-0.02%)
helped: 1519
HURT: 329
helped stats (abs) min: 2 max: 462 x̄: 10.59 x̃: 4
helped stats (rel) min: 0.03% max: 16.73% x̄: 0.79% x̃: 0.36%
HURT stats (abs)   min: 2 max: 598 x̄: 40.74 x̃: 16
HURT stats (rel)   min: <.01% max: 10.32% x̄: 2.56% x̃: 0.98%
95% mean confidence interval for cycles value: -3.53 0.63
95% mean confidence interval for cycles %-change: -0.30% -0.09%
Inconclusive result (value mean confidence interval includes 0).

total fills in shared programs: 34601 -> 34592 (-0.03%)
fills in affected programs: 91 -> 82 (-9.89%)
helped: 9
HURT: 0

Ivy Bridge
total instructions in shared programs: 12053565 -> 12051626 (-0.02%)
instructions in affected programs: 298103 -> 296164 (-0.65%)
helped: 1228
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 1.58 x̃: 1
helped stats (rel) min: 0.04% max: 3.57% x̄: 0.91% x̃: 0.81%
95% mean confidence interval for instructions value: -1.63 -1.53
95% mean confidence interval for instructions %-change: -0.95% -0.88%
Instructions are helped.

total cycles in shared programs: 180322270 -> 180319922 (<.01%)
cycles in affected programs: 14123840 -> 14121492 (-0.02%)
helped: 1036
HURT: 195
helped stats (abs) min: 2 max: 462 x̄: 11.93 x̃: 2
helped stats (rel) min: 0.03% max: 14.05% x̄: 0.82% x̃: 0.35%
HURT stats (abs)   min: 2 max: 598 x̄: 51.33 x̃: 16
HURT stats (rel)   min: <.01% max: 9.68% x̄: 3.02% x̃: 0.72%
95% mean confidence interval for cycles value: -4.92 1.10
95% mean confidence interval for cycles %-change: -0.35% -0.07%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10864286 -> 10863189 (-0.01%)
instructions in affected programs: 159722 -> 158625 (-0.69%)
helped: 724
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.52 x̃: 1
helped stats (rel) min: 0.10% max: 2.91% x̄: 0.79% x̃: 0.62%
95% mean confidence interval for instructions value: -1.58 -1.46
95% mean confidence interval for instructions %-change: -0.82% -0.75%
Instructions are helped.

total cycles in shared programs: 153967938 -> 153957926 (<.01%)
cycles in affected programs: 1923186 -> 1913174 (-0.52%)
helped: 654
HURT: 56
helped stats (abs) min: 2 max: 170 x̄: 20.00 x̃: 4
helped stats (rel) min: 0.03% max: 11.82% x̄: 0.89% x̃: 0.18%
HURT stats (abs)   min: 2 max: 390 x̄: 54.75 x̃: 32
HURT stats (rel)   min: 0.05% max: 6.92% x̄: 3.09% x̃: 2.92%
95% mean confidence interval for cycles value: -17.42 -10.78
95% mean confidence interval for cycles %-change: -0.76% -0.40%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8142677 -> 8141721 (-0.01%)
instructions in affected programs: 139511 -> 138555 (-0.69%)
helped: 588
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 1.63 x̃: 1
helped stats (rel) min: 0.21% max: 4.39% x̄: 0.84% x̃: 0.46%
95% mean confidence interval for instructions value: -1.70 -1.55
95% mean confidence interval for instructions %-change: -0.89% -0.78%
Instructions are helped.

total cycles in shared programs: 188549394 -> 188547676 (<.01%)
cycles in affected programs: 3171960 -> 3170242 (-0.05%)
helped: 527
HURT: 0
helped stats (abs) min: 2 max: 18 x̄: 3.26 x̃: 2
helped stats (rel) min: <.01% max: 0.80% x̄: 0.08% x̃: 0.06%
95% mean confidence interval for cycles value: -3.49 -3.03
95% mean confidence interval for cycles %-change: -0.09% -0.07%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-28 17:16:06 -07:00
Jason Ekstrand
859de4a748 intel/fs,vec4: Use g0 as the header for MFENCE
We set header_present but then pass it some random garbage.  Give it g0
instead.  I'm not actually sure this does anything but g0 is the usual
header data and this is what the windows driver does so it seems like a
good idea.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-30 14:00:26 +00:00
Jason Ekstrand
f2dc0f2872 nir: Drop imov/fmov in favor of one mov instruction
The difference between imov and fmov has been a constant source of
confusion in NIR for years.  No one really knows why we have two or when
to use one vs. the other.  The real reason is that they do different
things in the presence of source and destination modifiers.  However,
without modifiers (which many back-ends don't have), they are identical.
Now that we've reworked nir_lower_to_source_mods to leave one abs/neg
instruction in place rather than replacing them with imov or fmov
instructions, we don't need two different instructions at all anymore.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Rob Clark <robdclark@chromium.org>
2019-05-24 08:38:11 -05:00
Jason Ekstrand
8ffbb54405 intel: Implement abs, neg, and sat in the back-end
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-24 08:38:11 -05:00
Karol Herbst
14531d676b nir: make nir_const_value scalar
v2: remove & operator in a couple of memsets
    add some memsets
v3: fixup lima

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
2019-04-14 22:25:56 +02:00
Jason Ekstrand
6b1c398bcb intel/nir: Take a nir_tex_instr and src index in brw_texture_offset
This makes things a bit simpler and it's also more robust because it no
longer has a hard dependency on the offset being a 32-bit value.
2019-04-14 22:25:56 +02:00
Jason Ekstrand
61e009d2c4 spirv: Use the same types for resource indices as pointers
We need more space than just a 32-bit scalar and we have to burn all
that space anyway so we may as well expose it to the driver.  This also
fixes a subtle bug when UBOs and SSBOs have different pointer types.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Ian Romanick
d2056ab993 intel/vec4: Emit constants for some ALU sources as immediate values
In some cases of flow control, the constant propagation is not able to
determine that the source of an instruction must be a constant value.
When we still have NIR SSA values, we can easily determine this.  Emit
the immediate value during code generation to possible avoid spurious
loads of constants into registers.

I wrote this patch to prevent a couple trivial regressions in vec4
shaders caused by "nir/algebraic: Replace i2b used by bcsel or
if-statement with comparison".  The final result was quite a bit better
than that...

No shader-db changes on any Gen8+ platform.

v2: Assert that we never get a negation source modifier on Gen8+.
Suggested by Ken.  This should never happen because we don't normally
use vec4 for Gen8+ (requires and environment variable to force it), and
there's no code to generate these negations.  Still, erring on the side
of caution is better.

Haswell
total instructions in shared programs: 13776218 -> 13764783 (-0.08%)
instructions in affected programs: 663931 -> 652496 (-1.72%)
helped: 3495
HURT: 1
helped stats (abs) min: 1 max: 30 x̄: 3.28 x̃: 2
helped stats (rel) min: 0.21% max: 10.00% x̄: 1.79% x̃: 1.49%
HURT stats (abs)   min: 24 max: 24 x̄: 24.00 x̃: 24
HURT stats (rel)   min: 12.24% max: 12.24% x̄: 12.24% x̃: 12.24%
95% mean confidence interval for instructions value: -3.39 -3.15
95% mean confidence interval for instructions %-change: -1.84% -1.75%
Instructions are helped.

total cycles in shared programs: 386818984 -> 386511910 (-0.08%)
cycles in affected programs: 20379636 -> 20072562 (-1.51%)
helped: 3052
HURT: 476
helped stats (abs) min: 2 max: 12516 x̄: 110.40 x̃: 6
helped stats (rel) min: 0.05% max: 24.68% x̄: 1.58% x̃: 0.69%
HURT stats (abs)   min: 2 max: 416 x̄: 62.76 x̃: 24
HURT stats (rel)   min: 0.10% max: 10.75% x̄: 4.03% x̃: 2.18%
95% mean confidence interval for cycles value: -115.57 -58.51
95% mean confidence interval for cycles %-change: -0.93% -0.73%
Cycles are helped.

total spills in shared programs: 100482 -> 100480 (<.01%)
spills in affected programs: 79 -> 77 (-2.53%)
helped: 3
HURT: 1

total fills in shared programs: 96883 -> 96877 (<.01%)
fills in affected programs: 85 -> 79 (-7.06%)
helped: 4
HURT: 0

Ivy Bridge
total instructions in shared programs: 12000562 -> 11990113 (-0.09%)
instructions in affected programs: 572581 -> 562132 (-1.82%)
helped: 3106
HURT: 0
helped stats (abs) min: 1 max: 30 x̄: 3.36 x̃: 2
helped stats (rel) min: 0.21% max: 10.00% x̄: 1.86% x̃: 1.49%
95% mean confidence interval for instructions value: -3.49 -3.23
95% mean confidence interval for instructions %-change: -1.91% -1.81%
Instructions are helped.

total cycles in shared programs: 180958504 -> 180664500 (-0.16%)
cycles in affected programs: 19991810 -> 19697806 (-1.47%)
helped: 2654
HURT: 486
helped stats (abs) min: 2 max: 12516 x̄: 121.61 x̃: 6
helped stats (rel) min: 0.05% max: 20.66% x̄: 1.48% x̃: 0.68%
HURT stats (abs)   min: 2 max: 396 x̄: 59.18 x̃: 24
HURT stats (rel)   min: 0.05% max: 9.62% x̄: 3.82% x̃: 2.16%
95% mean confidence interval for cycles value: -125.62 -61.64
95% mean confidence interval for cycles %-change: -0.76% -0.56%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10842336 -> 10835438 (-0.06%)
instructions in affected programs: 395340 -> 388442 (-1.74%)
helped: 1926
HURT: 0
helped stats (abs) min: 1 max: 22 x̄: 3.58 x̃: 2
helped stats (rel) min: 0.10% max: 9.68% x̄: 1.78% x̃: 1.42%
95% mean confidence interval for instructions value: -3.73 -3.43
95% mean confidence interval for instructions %-change: -1.84% -1.72%
Instructions are helped.

total cycles in shared programs: 154590074 -> 154569050 (-0.01%)
cycles in affected programs: 8159932 -> 8138908 (-0.26%)
helped: 1670
HURT: 228
helped stats (abs) min: 2 max: 260 x̄: 18.13 x̃: 6
helped stats (rel) min: 0.02% max: 8.70% x̄: 0.74% x̃: 0.28%
HURT stats (abs)   min: 2 max: 1798 x̄: 40.58 x̃: 14
HURT stats (rel)   min: 0.03% max: 12.97% x̄: 1.04% x̃: 0.31%
95% mean confidence interval for cycles value: -13.51 -8.64
95% mean confidence interval for cycles %-change: -0.60% -0.46%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8212357 -> 8206587 (-0.07%)
instructions in affected programs: 323664 -> 317894 (-1.78%)
helped: 1457
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 3.96 x̃: 3
helped stats (rel) min: 0.33% max: 11.49% x̄: 1.86% x̃: 1.44%
95% mean confidence interval for instructions value: -4.14 -3.78
95% mean confidence interval for instructions %-change: -1.93% -1.78%
Instructions are helped.

total cycles in shared programs: 187668016 -> 187657422 (<.01%)
cycles in affected programs: 14856234 -> 14845640 (-0.07%)
helped: 1372
HURT: 83
helped stats (abs) min: 2 max: 24 x̄: 7.92 x̃: 6
helped stats (rel) min: 0.02% max: 1.14% x̄: 0.12% x̃: 0.08%
HURT stats (abs)   min: 2 max: 14 x̄: 3.20 x̃: 2
HURT stats (rel)   min: 0.03% max: 0.60% x̄: 0.12% x̃: 0.12%
95% mean confidence interval for cycles value: -7.65 -6.91
95% mean confidence interval for cycles %-change: -0.11% -0.10%
Cycles are helped.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:41:46 -08:00
Eric Anholt
8f3694e1ab intel: Use the NIR lowering for isign.
Drops one instruction from fs-sign-int.shader_test.  No change in
shader-db due to it having 0 instances of sign(genIType).  This may hurt
isign64 if algebraic runs before int64 lowering, but I wasn't sure how to
mark the algebraic opt as "every bit size but 64".

v2: Update commit message about shader-db.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
2019-02-14 00:32:30 +00:00
Kenneth Graunke
04c2f12ab2 i965: Drop mark_surface_used mechanism.
The original idea was that the backend compiler could eliminate
surfaces, so we would have it mark which ones are actually used,
then shrink the binding table accordingly.  Unfortunately, it's a
pretty blunt mechanism - it can only prune things from the end,
not the middle - since we decide the layout before we even start
the backend compiler, and only limit the size.  It also basically
gives up if it sees indirect array access.

Besides, we do the vast majority of our surface elimination in NIR
anyway, not the backend - and I don't see that trend changing any
time soon.  Vulkan abandoned this plan a long time ago, and I don't
use it in Iris, but it's still been kicking around in i965.

I hacked shader-db to print the binding table size in bytes, and
observed no changes with this patch.  So, this code appears to do
nothing useful.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-13 09:35:32 -08:00
Jason Ekstrand
80e8dfe9de nir: Rename Boolean-related opcodes to include 32 in the name
This is a squash of a bunch of individual changes:

    nir/builder: Generate 32-bit bool opcodes transparently

    nir/algebraic: Remap Boolean opcodes to the 32-bit variant

    Use 32-bit opcodes in the NIR producers and optimizations

        Generated with a little hand-editing and the following sed commands:

        sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
        sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
        sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
        sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
        sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
        sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c

     Use 32-bit opcodes in the NIR back-ends

        Generated with a little hand-editing and the following sed commands:

        sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
        sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
        sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
        sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
        sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
        sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Jason Ekstrand
455ec7327d i965/vec4: Implement nir_op_uadd_sat
Reviewed-by: Ian Romanick ian.d.romanick@intel.com
2018-12-13 17:49:48 +00:00
Jason Ekstrand
dca6cd9ce6 nir: Make boolean conversions sized just like the others
Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is
one if 8, 16, 32, or 64.  This leads to having a few more opcodes but
now everything is consistent and booleans aren't a weird special case
anymore.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2018-12-05 15:03:07 -06:00
Jason Ekstrand
dca35c598d intel/fs,vec4: Fix a compiler warning
../src/intel/compiler/brw_fs_nir.cpp:3534:46: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘int’ [-Wsign-compare]
       assert(nir_intrinsic_write_mask(instr) ==
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
              (1 << instr->num_components) - 1);
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This was caused by 6339aba775 which added these completely valid
checks.  However clang likes to complain about signedness mismatches.

Fixes: 6339aba775 "intel/compiler: Lower SSBO and shared..."
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2018-11-19 09:57:41 -06:00
Jason Ekstrand
6339aba775 intel/compiler: Lower SSBO and shared loads/stores in NIR
We have a bunch of code to do this in the back-end compiler but it's
fairly specific to typed surface messages and the way we emit them.
This breaks it out into NIR were it's easier to do things a bit more
generally.  It also means we can easily share the code between the vec4
and FS back-ends if we wish.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-11-15 19:59:49 -06:00
Jason Ekstrand
1413512b4c intel/vec4: Use the new nir_src_is_const and friends
As of this commit, all uses of const sources either go through a
nir_src_as_<type> helper which handles bit sizes correctly or else are
accompanied by a nir_src_bit_size() == 32 assertion to assert that we
have the size we think we have.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-11-08 10:09:25 -06:00
Jason Ekstrand
6b2918709a intel/fs,vec4: Clean up a repeated pattern with SSBOs
Everywhere we handle SSBO intrinsics, we have exactly the same pattern
for computing the index so we may as well make a helper for it.  We also
add a get_nir_src_imm to vec4 and use it for SSBO offsets.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-11-08 10:09:06 -06:00
Jason Ekstrand
d3a0d8b750 intel/compiler: Stop assuming the entrypoint is called "main"
This isn't true for Vulkan so we have to whack it to "main" in anv which
is silly.  Instead of walking the list of functions and asserting that
everything is named "main" and hoping there's only one function named
"main", just use the nir_shader_get_entrypoint() helper which has better
assertions anyway.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-10-30 20:14:52 -05:00
Jason Ekstrand
0e0dc596a2 intel/vec4: Fix nir_op_b2[fi] with 64-bit result
This is valid NIR but you can't actually hit this case today.  GLSL IR
doesn't have a bool to double opcode; it does f2d(b2f(x)).  In SPIR-V we
don't have any to/from bool conversion opcodes at all.  However, the
next commit will make us start generating it so we should be ready.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-10-11 15:21:19 -05:00
Ian Romanick
b44c9292b7 intel/compiler: Don't handle fsign.sat
No shader-db or CI changes on any Intel platform.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-10-09 13:56:42 -07:00
Ian Romanick
c836326a29 i965/vec4: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1
No shader-db changes on any Intel platform.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-28 15:35:50 -07:00
Ian Romanick
9626ea497d i965/vec4: Properly handle sign(-abs(x))
This is achived by copying the sign(abs(x)) optimization from the FS
backend.

On Gen7 an earlier platforms, this fixes new piglit tests:

 - glsl-1.10/execution/vs-sign-neg-abs.shader_test
 - glsl-1.10/execution/vs-sign-sat-neg-abs.shader_test

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-06 16:20:07 -07:00
Ian Romanick
965a06dbd7 i965/vec4: Make the vec4_visitor::nir_emit_instr default case unreachable
The bug fixed by the previous commit went undetected because extra
stderr messages are not flagged by the CI.  Copy the solution from
fs_visitor::nir_emit_instr and mark the default case unreachable.

An alternate solution is to delete the default case so that the compiler
will issue a warning.  That may require more work since there are other
(impossible) cases that exist.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-05 21:13:32 -07:00
Kenneth Graunke
a1afef8de0 i965: Combine {VS,FS}_OPCODE_GET_BUFFER_SIZE opcodes.
These are the same, we don't need a separate opcode enum per backend.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-30 20:30:34 -08:00
Iago Toral Quiroga
8620f7ebbc i965/vec4: use a temp register to compute offsets for pull loads
64-bit pull loads are implemented by emitting 2 separate
32-bit pull load messages, where the second message loads from
an offset at +16B.

That addition of 16B to the original offset should not alter the
original offset register used as source for the pull load instruction
though, since the compiler might use that same offset register in other
instructions (for example, for other pull loads in the shader code
that take that same offset as reference).

If the pull load is 32-bit then we only need to emit one message and
we don't need to do offset calculations, but in that case the optimizer
should be able to drop the redundant MOV.

Fixes the following test on Haswell:
KHR-GL45.gpu_shader_fp64.fp64.max_uniform_components

Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103007
2017-11-30 07:57:53 +01:00