Commit graph

1355 commits

Author SHA1 Message Date
Timothy Arceri
e30804c602 nir/radv: remove restrictions on opt_if_loop_last_continue()
When I implemented opt_if_loop_last_continue() I had restricted
this pass from moving other if-statements inside the branch opposite
the continue. At the time it was causing a bunch of spilling in
shader-db for i965.

However Samuel Pitoiset noticed that making this pass more aggressive
significantly improved the performance of Doom on RADV. Below are
the statistics he gathered.

28717 shaders in 14931 tests
Totals:
SGPRS: 1267317 -> 1267549 (0.02 %)
VGPRS: 896876 -> 895920 (-0.11 %)
Spilled SGPRs: 24701 -> 26367 (6.74 %)
Code Size: 48379452 -> 48507880 (0.27 %) bytes
Max Waves: 241159 -> 241190 (0.01 %)

Totals from affected shaders:
SGPRS: 23584 -> 23816 (0.98 %)
VGPRS: 25908 -> 24952 (-3.69 %)
Spilled SGPRs: 503 -> 2169 (331.21 %)
Code Size: 2471392 -> 2599820 (5.20 %) bytes
Max Waves: 586 -> 617 (5.29 %)

The codesize increases is related to Wolfenstein II it seems largely
due to an increase in phis rather than the existing jumps.

This gives +10% FPS with Doom on my Vega56.

Rhys Perry also benchmarked Doom on his VEGA64:

Before: 72.53 FPS
After:  80.77 FPS

v2: disable pass on non-AMD drivers

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-09 11:29:41 +10:00
Dave Airlie
11e1fa11d6 intel/compiler: use defined size for vector components
If we increase vector sizing later it would be nice to avoid
tripped over this again.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-03 13:59:06 +10:00
Ian Romanick
7832fb7889 intel/compiler: Use partial redundancy elimination for compares
Almost all of the hurt shaders are repeated instances of the same shader
in synmark's compilation speed tests.

shader-db results:

All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15256840 -> 15256389 (<.01%)
instructions in affected programs: 54137 -> 53686 (-0.83%)
helped: 288
HURT: 0
helped stats (abs) min: 1 max: 15 x̄: 1.57 x̃: 1
helped stats (rel) min: 0.06% max: 26.67% x̄: 1.99% x̃: 0.74%
95% mean confidence interval for instructions value: -1.76 -1.38
95% mean confidence interval for instructions %-change: -2.47% -1.50%
Instructions are helped.

total cycles in shared programs: 372286583 -> 372283851 (<.01%)
cycles in affected programs: 833829 -> 831097 (-0.33%)
helped: 265
HURT: 16
helped stats (abs) min: 2 max: 74 x̄: 11.81 x̃: 4
helped stats (rel) min: 0.04% max: 9.07% x̄: 0.99% x̃: 0.35%
HURT stats (abs)   min: 2 max: 130 x̄: 24.88 x̃: 8
HURT stats (rel)   min: <.01% max: 12.31% x̄: 1.44% x̃: 0.27%
95% mean confidence interval for cycles value: -12.30 -7.15
95% mean confidence interval for cycles %-change: -1.06% -0.64%
Cycles are helped.

Iron Lake and GM45 had similar results. (GM45 shown)
total instructions in shared programs: 5038653 -> 5038495 (<.01%)
instructions in affected programs: 13939 -> 13781 (-1.13%)
helped: 50
HURT: 1
helped stats (abs) min: 1 max: 15 x̄: 3.18 x̃: 4
helped stats (rel) min: 0.33% max: 13.33% x̄: 2.24% x̃: 1.09%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.83% max: 0.83% x̄: 0.83% x̃: 0.83%
95% mean confidence interval for instructions value: -3.73 -2.47
95% mean confidence interval for instructions %-change: -3.16% -1.21%
Instructions are helped.

total cycles in shared programs: 128118922 -> 128118228 (<.01%)
cycles in affected programs: 134906 -> 134212 (-0.51%)
helped: 50
HURT: 0
helped stats (abs) min: 2 max: 60 x̄: 13.88 x̃: 18
helped stats (rel) min: 0.06% max: 3.19% x̄: 0.74% x̃: 0.70%
95% mean confidence interval for cycles value: -16.54 -11.22
95% mean confidence interval for cycles %-change: -0.95% -0.53%
Cycles are helped.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-28 15:35:53 -07:00
Danylo Piliaiev
e0db0c74b9 intel/fs: Make alpha test work with MRT and sample mask
Fix the order of src0_alpha and sample mask in fb payload.
From SKL PRM Volume 7, "Data Payload Register Order
for Render Target Write Messages":
 Type   S0A  oM  sZ  oS  M2     M3       M4
 SIMD8   1   1   0   0   s0A    oM       R
 SIMD16  1   1   0   0   1/0s0A 3/2s0A   oM

It also fixes working of alpha to coverage with sample mask
on GEN6 since now they are in correct order.

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by:  Francisco Jerez <currojerez@riseup.net>
2019-03-25 13:54:55 -07:00
Danylo Piliaiev
c8abe03f3b i965,iris,anv: Make alpha to coverage work with sample mask
From "Alpha Coverage" section of SKL PRM Volume 7:
 "If Pixel Shader outputs oMask, AlphaToCoverage is disabled in
  hardware, regardless of the state setting for this feature."

From OpenGL spec 4.6, "15.2 Shader Execution":
 "The built-in integer array gl_SampleMask can be used to change
 the sample coverage for a fragment from within the shader."

From OpenGL spec 4.6, "17.3.1 Alpha To Coverage":
 "If SAMPLE_ALPHA_TO_COVERAGE is enabled, a temporary coverage value
  is generated where each bit is determined by the alpha value at the
  corresponding sample location. The temporary coverage value is then
  ANDed with the fragment coverage value to generate a new fragment
  coverage value."

Similar wording could be found in Vulkan spec 1.1.100
"25.6. Multisample Coverage"

Thus we need to compute alpha to coverage dithering manually in shader
and replace sample mask store with the bitwise-AND of sample mask and
alpha to coverage dithering.

The following formula is used to compute final sample mask:
  m = int(16.0 * clamp(src0_alpha, 0.0, 1.0))
  dither_mask = 0x1111 * ((0xfea80 >> (m & ~3)) & 0xf) |
     0x0808 * (m & 2) | 0x0100 * (m & 1)
  sample_mask = sample_mask & dither_mask
Credits to Francisco Jerez <currojerez@riseup.net> for creating it.

It gives a number of ones proportional to the alpha for 2, 4, 8 or 16
least significant bits of the result.

GEN6 hardware does not have issue with simultaneous usage of sample mask
and alpha to coverage however due to the wrong sending order of oMask
and src0_alpha it is still affected by it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109743

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2019-03-25 13:54:55 -07:00
Iago Toral Quiroga
3766334923 compiler/nir: add lowering for 16-bit flrp
And enable it on Intel.

v2:
 - Squash the change to enable it on Intel (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-25 16:08:25 +01:00
Iago Toral Quiroga
ca31df6f1f compiler/nir: add lowering option for 16-bit fmod
And enable it on Intel.

v2:
 - Squash the change to enable this lowering on Intel (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-25 16:08:25 +01:00
Caio Marcelo de Oliveira Filho
fb024f5e72 intel/compiler: handle GLSL_TYPE_INTERFACE as GLSL_TYPE_STRUCT
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-23 10:22:39 -07:00
Jason Ekstrand
08f804ec0c anv,radv,turnip: Lower TG4 offsets with nir_lower_tex
v2: turn on for turnip as well (Karol Herbst)

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-03-21 02:58:41 +00:00
Jason Ekstrand
d3386e73c5 intel/nir: Lower array-deref-of-vector UBO and SSBO loads
This fixes a serious performance issue with DXVK:

https://github.com/doitsujin/dxvk/issues/937

This was caused by a recent change that to improve performance on RADV
which back-fired on ANV and killed performance for some apps:

e5a06d3f4a

Throwing in this bit of lowering lets us come along and CSE those UBO
loads (or copy-prop for SSBO load) and get one load where we previously
would have gotten several.

VkPipeline-db results on Kaby Lake:

    total instructions in shared programs: 5115361 -> 5073185 (-0.82%)
    instructions in affected programs: 1754333 -> 1712157 (-2.40%)
    helped: 5331
    HURT: 63

    total cycles in shared programs: 2544501169 -> 2481144545 (-2.49%)
    cycles in affected programs: 2531058653 -> 2467702029 (-2.50%)
    helped: 9202
    HURT: 4323

    total loops in shared programs: 3340 -> 3331 (-0.27%)
    loops in affected programs: 9 -> 0
    helped: 9
    HURT: 0

    total spills in shared programs: 3246 -> 3053 (-5.95%)
    spills in affected programs: 384 -> 191 (-50.26%)
    helped: 10
    HURT: 5

    total fills in shared programs: 4626 -> 4452 (-3.76%)
    fills in affected programs: 439 -> 265 (-39.64%)
    helped: 10
    HURT: 5

All of the shaders with hurt spilling were in Rise of the Tomb Raider
which also had shaders solidly helped in the spilling department.  Not
shown in those results (because I've not had success dumping the
shaders) is Witcher 3 where this reduces spilling and improves over-all
perf by around 20-25%.  There were no shader-db changes.  Apparently,
this just isn't a pattern that happens in OpenGL.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: "19.0" mesa-stable@lists.freedesktop.org
2019-03-15 23:10:27 -05:00
Jason Ekstrand
be2990d8fb i965: Stop setting LowerBuferInterfaceBlocks
Instead, we do UBO and SSBO deref lowering in NIR after we've given it a
chance to optimize SSBO access:

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15235775 -> 15235484 (<.01%)
    instructions in affected programs: 14992 -> 14701 (-1.94%)
    helped: 19
    HURT: 20

    total cycles in shared programs: 339220331 -> 339027307 (-0.06%)
    cycles in affected programs: 79831981 -> 79638957 (-0.24%)
    helped: 540
    HURT: 602

    total loops in shared programs: 4402 -> 4348 (-1.23%)
    loops in affected programs: 186 -> 132 (-29.03%)
    helped: 27
    HURT: 0

    total spills in shared programs: 23261 -> 23234 (-0.12%)
    spills in affected programs: 38 -> 11 (-71.05%)
    helped: 1
    HURT: 0

    total fills in shared programs: 31442 -> 31371 (-0.23%)
    fills in affected programs: 98 -> 27 (-72.45%)
    helped: 1
    HURT: 0

    LOST:   12
    GAINED: 12

Most of the help and hurt in instruction counts was just churn caused by
re-ordering of optimizations and the fact that the NIR deref lowering
code is emitting slightly different instructions.  Nothing was hurt by
more than three instructions and most things weren't helped by more than
four.  The primary exception to this is one Car Chase shader:

    shaders/non-free/gfxbench4/carchase/341.shader_test CS SIMD32: 1144 -> 821 (-28.23%)

There is also one compute shader in Manhattan 3.1 and a fragment shader
in the UE4 Shooter Game demo that now get a loop partially unrolled.
Those showed up in the results as hurt instructions but were manually
removed to get the results above.

The lost/gained was a dozen Car Chase shaders that went from SIMD8 to
SIMD16 thanks to improved register pressure:

    shaders/non-free/gfxbench4/carchase/366.shader_test CS
    shaders/non-free/gfxbench4/carchase/368.shader_test CS
    shaders/non-free/gfxbench4/carchase/370.shader_test CS
    shaders/non-free/gfxbench4/carchase/372.shader_test CS
    shaders/non-free/gfxbench4/carchase/376.shader_test CS
    shaders/non-free/gfxbench4/carchase/378.shader_test CS
    shaders/non-free/gfxbench4/carchase/380.shader_test CS
    shaders/non-free/gfxbench4/carchase/382.shader_test CS
    shaders/non-free/gfxbench4/carchase/384.shader_test CS
    shaders/non-free/gfxbench4/carchase/388.shader_test CS
    shaders/non-free/gfxbench4/carchase/4.shader_test CS
    shaders/non-free/gfxbench4/carchase/6.shader_test CS

Given how much it appeared to be improved, I ran Car Chase on my laptop.
Unfortunately, I wasn't able to see any measurable improvement.  It
might be helped by 1-2% but it's in the noise.  It does render correctly
as far as I can tell so the improvement is legitimate.

All of the loops that got delete were in dolphin uber shaders.  I've had
no opportunity to test them for correctness or performance.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Plamena Manolova
19ab082001 i965: Disable ARB_fragment_shader_interlock for platforms prior to GEN9
ARB_fragment_shader_interlock depends on memory fences to
ensure fragment ordering and this ordering guarantee is
only supported from GEN9 onwards.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109980
Fixes: 939312702e "i965: Add ARB_fragment_shader_interlock support."
Signed-off-by: Plamena Manolova <plamena.n.manolova@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-14 13:04:12 +00:00
Caio Marcelo de Oliveira Filho
65e8761474 intel/nir: Combine store_derefs to improve code from SPIR-V
Due to lack of write mask in SPIR-V store, generators may produce
multiple stores to the same vector but using different array derefs.
Use the combining store pass to clean this up.  For example,

    layout(binding = 3) buffer block {
        vec4 v;
    };

    void main() {
        v.x = 11;
        v.y = 22;
    }

after going to SPIR-V and NIR, ends up with in two store_derefs to
v[0] and v[1]

    vec2 32 ssa_4 = deref_struct &ssa_3->field0 (ssbo vec4) /* &((block *)ssa_2)->field0 */
    vec2 32 ssa_6 = deref_array &(*ssa_4)[0] (ssbo float) /* &((block *)ssa_2)->field0[0] */
    intrinsic store_deref (ssa_6, ssa_7) (1, 0) /* wrmask=x */ /* access=0 */
    vec1 32 ssa_13 = load_const (0x00000001 /* 0.000000 */)
    vec2 32 ssa_14 = deref_array &(*ssa_4)[1] (ssbo float) /* &((block *)ssa_2)->field0[1] */
    intrinsic store_deref (ssa_14, ssa_15) (1, 0) /* wrmask=x */ /* access=0 */

producing two different sends instructions in skl.  The combining pass
transform the snippet above into

    vec2 32 ssa_4 = deref_struct &ssa_3->field0 (ssbo vec4) /* &((block *)ssa_2)->field0 */
    vec4 32 ssa_18 = vec4 ssa_7, ssa_15, ssa_16, ssa_17
    intrinsic store_deref (ssa_4, ssa_18) (3, 0) /* wrmask=xy */ /* access=0 */

producing a single sends instruction.

v2: Move this from spirv_to_nir into the general optimization pass for
    intel compiler.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-13 08:39:16 -07:00
Caio Marcelo de Oliveira Filho
10dfb0011e intel/nir: Combine store_derefs after vectorizing IO
Shader-db results for skl:

    total instructions in shared programs: 15232903 -> 15224781 (-0.05%)
    instructions in affected programs: 61246 -> 53124 (-13.26%)
    helped: 221
    HURT: 0

    total cycles in shared programs: 371440470 -> 371398018 (-0.01%)
    cycles in affected programs: 281363 -> 238911 (-15.09%)
    helped: 221
    HURT: 0

Results for bdw are very similar.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-13 08:39:16 -07:00
Caio Marcelo de Oliveira Filho
822a8865e4 nir: Add a pass to combine store_derefs to same vector
v2: (all from Jason)
    Reuse existing function for the end of the block combinations.
    Check the SSA values are coming from the right place in tests.
    Document the case when the store to array_deref is reused.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-13 08:39:16 -07:00
Kenneth Graunke
3570d15b6d intel/fs: Fix opt_peephole_csel to not throw away saturates.
We were not copying the saturate bit from the original instruction
to the new replacement instruction.  This caused major misrendering
in DiRT Rally on iris, where comparisons leading to discards failed
due to the missing saturate, causing lots of extra garbage pixels to
be drawn in text rendering, trees, and so on.

This did not show up on i965 because st/nir performs a more aggressive
version of nir_opt_peephole_select, yielding more b32csel operations.

Fixes: 52c7df1643 i965/fs: Merge CMP and SEL into CSEL on Gen8+

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 20:11:55 -07:00
Jason Ekstrand
6d5d89d25a intel/nir: Vectorize all IO
The IO scalarization pass that we run to help with linking end up
turning some shader I/O such as that for tessellation and geometry
shaders into many scalar URB operations rather than one vector one.  To
alleviate this, we now vectorize the I/O once again.  This fixes a 10%
performance regression in the GfxBench tessellation test that was caused
by scalarizing.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15224023 -> 15220871 (-0.02%)
    instructions in affected programs: 342009 -> 338857 (-0.92%)
    helped: 1236
    HURT: 443

    total spills in shared programs: 23471 -> 23465 (-0.03%)
    spills in affected programs: 6 -> 0
    helped: 1
    HURT: 0

    total fills in shared programs: 31770 -> 31766 (-0.01%)
    fills in affected programs: 4 -> 0
    helped: 1
    HURT: 0

Cycles was just a lot of churn do to moves being different places.  Most
of the pure churn in instructions was +/- one or two instructions in
fragment shaders.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510
Fixes: 4434591bf5 "intel/nir: Call nir_lower_io_to_scalar_early"
Fixes: 8d8222461f "intel/nir: Enable nir_opt_find_array_copies"
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-03-12 15:34:06 +00:00
Jason Ekstrand
179d254cba intel/nir: Move lower_mem_access_bit_sizes to postprocess_nir
It doesn't really matter where this pass goes as long as it's after we
call nir_lower_explicit_io and before we go into the back-end.  Putting
it brw_postprocess_nir lets us move nir_lower_explicit_io significantly
later in the pipeline.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-08 22:03:14 -06:00
Brian Paul
0de83bacf0 intel/compiler: silence unitialized variable warning in opt_vector_float()
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-08 10:23:11 -07:00
Jason Ekstrand
656ace3dd8 intel/nir: Move 64-bit lowering later
Now that we have a loop unrolling cost function and loop unrolling isn't
going to kill us the moment we have a 64-bit op in a loop, we can go
ahead and move 64-bit lowering later.  This gives us the opportunity to
do more optimizations and actually let the full optimizer run even on
64-bit ops rather than hoping one round of opt_algebraic will fix
everything.  This substantially reduces both fp64 shader compile times
and the resulting code size.  On the vs-isnan-dvec test from piglit:

Before this commit:

    1684.63s user 17.29s system 99% cpu 28:28.24 total
    101479 instructions. 0 loops. 802452 cycles. 79:369 spills:fills.
    Peak memory usage (according to massif): 1.435 GB

After this commit:

    179.64s user 7.75s system 99% cpu 3:07.92 total
    57316 instructions. 0 loops. 459287 cycles. 0:0 spills:fills.
    Peak memory usage (according to massif): 531.0 MB

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand
e02959f442 nir/lower_doubles: Inline functions directly in lower_doubles
Instead of trusting the caller to already have created a softfp64
function shader and added all its functions to our shader, we simply
take the softfp64 shader as an argument and do the function inlining
ouselves.  This means that there's no more nasty functions lying around
that the caller needs to worry about cleaning up.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand
8993e0973f intel/nir: Drop an unneeded lower_constant_initializers call
Even though this is technically a step in the function inlining process
as laid out in nir_inline_functions.c, it's not really needed.  We
already have constant initializers lowered here and no new ones are
added by appending the softfp64 functions.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand
fa4824c1db intel/debug: Add a debug flag to force software fp64
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Ian Romanick
55e6454d5e intel/fs: Fix extract_u8 of an odd byte from a 64-bit integer
In the old code, we would generate the exact same instruction for
extract_u8(some_u64, 0) and extract_u8(some_u64, 1).  The mask-a-word
trick only works for even numbered bytes.

This fixes the (new) piglit test
tests/spec/arb_gpu_shader_int64/execution/fs-ushr-and-mask.shader_test.

v2: Use a SHR instead of an AND.  This saves an instruction compared to
using two moves.  Suggested by Jason.

Fixes: 6ac2d16901 ("i965/fs: Fix extract_i8/u8 to a 64-bit destination")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-06 08:35:45 -08:00
Ian Romanick
4aaf139ea4 intel/fs: nir_op_extract_i8 extracts a byte, not a word
Fixes: 6ac2d16901 ("i965/fs: Fix extract_i8/u8 to a 64-bit destination")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-06 08:35:42 -08:00
Ian Romanick
bbf20a1ca3 intel/compiler: Silence unused parameter warning in brw_interpolation_map.c
The parameter is never used, and it's not part of a common interface
idiom.  Remove it.

src/intel/compiler/brw_interpolation_map.c: In function ‘brw_setup_vue_interpolation’:
src/intel/compiler/brw_interpolation_map.c:62:59: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
                             const struct gen_device_info *devinfo)
                                                           ^~~~~~~

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-06 08:35:36 -08:00
Ian Romanick
dea19138dd intel/compiler: Silence many unused parameter warnings in brw_eu.h
In file included from src/intel/compiler/brw_eu_util.c:34:0:
src/intel/compiler/brw_eu.h: In function ‘brw_message_desc_header_present’:
src/intel/compiler/brw_eu.h:288:63: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_message_desc_header_present(const struct gen_device_info *devinfo,
                                                               ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_message_ex_desc’:
src/intel/compiler/brw_eu.h:296:51: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_message_ex_desc(const struct gen_device_info *devinfo,
                                                   ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_message_ex_desc_ex_mlen’:
src/intel/compiler/brw_eu.h:303:59: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_message_ex_desc_ex_mlen(const struct gen_device_info *devinfo,
                                                           ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_binding_table_index’:
src/intel/compiler/brw_eu.h:337:68: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_sampler_desc_binding_table_index(const struct gen_device_info *devinfo,
                                                                    ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_sampler’:
src/intel/compiler/brw_eu.h:344:56: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_sampler_desc_sampler(const struct gen_device_info *devinfo, uint32_t desc)
                                                        ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_return_format’:
src/intel/compiler/brw_eu.h:371:62: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_sampler_desc_return_format(const struct gen_device_info *devinfo,
                                                              ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_dp_desc_binding_table_index’:
src/intel/compiler/brw_eu.h:405:63: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_dp_desc_binding_table_index(const struct gen_device_info *devinfo,
                                                               ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_dp_a64_untyped_atomic_desc’:
src/intel/compiler/brw_eu.h:754:41: warning: unused parameter ‘exec_size’ [-Wunused-parameter]
                                unsigned exec_size, /**< 0 for SIMD4x2 */
                                         ^~~~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_dp_a64_untyped_atomic_float_desc’:
src/intel/compiler/brw_eu.h:775:47: warning: unused parameter ‘exec_size’ [-Wunused-parameter]
                                      unsigned exec_size,
                                               ^~~~~~~~~

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-06 08:35:31 -08:00
Timothy Arceri
81ee2cd8ba glsl: rename is_record() -> is_struct()
Replace was done using:
find ./src -type f -exec sed -i -- \
's/is_record(/is_struct(/g' {} \;

Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 13:10:02 +11:00
Jason Ekstrand
61e009d2c4 spirv: Use the same types for resource indices as pointers
We need more space than just a 32-bit scalar and we have to burn all
that space anyway so we may as well expose it to the driver.  This also
fixes a subtle bug when UBOs and SSBOs have different pointer types.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand
5c96120b5c intel,nir: Lower TXD with min_lod when the sampler index is not < 16
When we have a larger sampler index, we get into the "high sampler"
scenario and need an instruction header.  Even in SIMD8, this pushes the
instruction over the sampler message size maximum of 11 registers.
Instead, we have to lower TXD to TXL.

Fixes: cb98e0755f "intel/fs: Support min_lod parameters on texture..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-04 23:56:39 +00:00
Sagar Ghuge
e551040c60 nir/glsl: Add another way of doing lower_imul64 for gen8+
On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We
can reduce our 64x64 int multiplication from 4 instructions to 3.

Also instead of emitting two mul instructions, we can emit single mul
instuction and extract low/high 32 bits from 64 bit result for
[i/u]mulExtended

v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand)
    2) Add lower_mul_2x32_64 flag (Matt Turner)
    3) Remove associative property as bit size is different (Connor
       Abbott)

v3: Fix indentation and variable naming convention (Jason Ekstrand)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-04 15:50:25 -08:00
Jordan Justen
10c5579921
intel/compiler: Move int64/doubles lowering options
Instead of calculating the int64 and doubles lowering options each
time a shader is preprocessed, save and use the values in
nir_shader_compiler_options.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-02 14:33:44 -08:00
Ian Romanick
d1d56f5f9a intel/fs: Don't assert on b2f with a saturate modifier
This ran afoul of Iris's use of nir_lower_clamp_color_outputs which
applies fsat() before writes to vertex shader color outpus.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: 7725d60938 ("intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))")
2019-03-02 13:58:50 -08:00
Matt Turner
e0148bbcfd
intel/compiler: Add commas on final values of compaction table arrays
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-03-01 13:56:25 -08:00
Ian Romanick
1edf67fc3f intel/fs: Generate if instructions with inverted conditions
Per-platform results were all over the place, so I have included all the
results here.  There is an important note at the bottom of the commit
message.

Skylake
total instructions in shared programs: 15184683 -> 15184679 (<.01%)
instructions in affected programs: 2786 -> 2782 (-0.14%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.05% max: 0.84% x̄: 0.44% x̃: 0.44%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.96% 0.07%
Inconclusive result (%-change mean confidence interval includes 0).

total cycles in shared programs: 370961367 -> 370961173 (<.01%)
cycles in affected programs: 205867 -> 205673 (-0.09%)
helped: 5
HURT: 1
helped stats (abs) min: 1 max: 149 x̄: 39.60 x̃: 16
helped stats (rel) min: 0.02% max: 1.05% x̄: 0.45% x̃: 0.55%
HURT stats (abs)   min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel)   min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -93.01 28.34
95% mean confidence interval for cycles %-change: -0.82% 0.08%
Inconclusive result (value mean confidence interval includes 0).

Broadwell
total instructions in shared programs: 15465366 -> 15465362 (<.01%)
instructions in affected programs: 2799 -> 2795 (-0.14%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.04% max: 0.84% x̄: 0.44% x̃: 0.44%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.96% 0.07%
Inconclusive result (%-change mean confidence interval includes 0).

total cycles in shared programs: 410938419 -> 410938531 (<.01%)
cycles in affected programs: 566028 -> 566140 (0.02%)
helped: 18
HURT: 17
helped stats (abs) min: 1 max: 16 x̄: 3.50 x̃: 1
helped stats (rel) min: <.01% max: 1.05% x̄: 0.13% x̃: <.01%
HURT stats (abs)   min: 1 max: 12 x̄: 10.29 x̃: 12
HURT stats (rel)   min: <.01% max: 0.16% x̄: 0.08% x̃: 0.09%
95% mean confidence interval for cycles value: 0.31 6.09
95% mean confidence interval for cycles %-change: -0.10% 0.05%
Inconclusive result (%-change mean confidence interval includes 0).

Haswell
total instructions in shared programs: 13749760 -> 13749759 (<.01%)
instructions in affected programs: 2241 -> 2240 (-0.04%)
helped: 1
HURT: 0

total cycles in shared programs: 385398913 -> 385398363 (<.01%)
cycles in affected programs: 554914 -> 554364 (-0.10%)
helped: 31
HURT: 1
helped stats (abs) min: 1 max: 453 x̄: 18.00 x̃: 6
helped stats (rel) min: <.01% max: 0.25% x̄: 0.03% x̃: 0.05%
HURT stats (abs)   min: 8 max: 8 x̄: 8.00 x̃: 8
HURT stats (rel)   min: 0.06% max: 0.06% x̄: 0.06% x̃: 0.06%
95% mean confidence interval for cycles value: -45.88 11.51
95% mean confidence interval for cycles %-change: -0.05% -0.02%
Inconclusive result (value mean confidence interval includes 0).

Ivy Bridge
total cycles in shared programs: 180663626 -> 180663881 (<.01%)
cycles in affected programs: 472350 -> 472605 (0.05%)
helped: 15
HURT: 30
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
HURT stats (abs)   min: 8 max: 10 x̄: 9.00 x̃: 9
HURT stats (rel)   min: 0.06% max: 0.14% x̄: 0.10% x̃: 0.10%
95% mean confidence interval for cycles value: 4.21 7.12
95% mean confidence interval for cycles %-change: 0.05% 0.08%
Cycles are HURT.

Sandy Bridge
total cycles in shared programs: 154568664 -> 154569225 (<.01%)
cycles in affected programs: 356486 -> 357047 (0.16%)
helped: 1
HURT: 31
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.02% max: 0.02% x̄: 0.02% x̃: 0.02%
HURT stats (abs)   min: 4 max: 33 x̄: 18.16 x̃: 8
HURT stats (rel)   min: 0.05% max: 0.23% x̄: 0.14% x̃: 0.10%
95% mean confidence interval for cycles value: 12.19 22.87
95% mean confidence interval for cycles %-change: 0.10% 0.16%
Cycles are HURT.

Iron Lake
total instructions in shared programs: 8206589 -> 8206565 (<.01%)
instructions in affected programs: 3024 -> 3000 (-0.79%)
helped: 12
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.75% max: 0.83% x̄: 0.80% x̃: 0.80%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.82% -0.77%
Instructions are helped.

total cycles in shared programs: 187657428 -> 187656228 (<.01%)
cycles in affected programs: 95748 -> 94548 (-1.25%)
helped: 12
HURT: 0
helped stats (abs) min: 80 max: 120 x̄: 100.00 x̃: 100
helped stats (rel) min: 1.00% max: 1.66% x̄: 1.27% x̃: 1.21%
95% mean confidence interval for cycles value: -113.27 -86.73
95% mean confidence interval for cycles %-change: -1.43% -1.11%
Cycles are helped.

GM45
total instructions in shared programs: 5037569 -> 5037557 (<.01%)
instructions in affected programs: 1521 -> 1509 (-0.79%)
helped: 6
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.75% max: 0.83% x̄: 0.79% x̃: 0.79%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.83% -0.75%
Instructions are helped.

total cycles in shared programs: 128101478 -> 128100758 (<.01%)
cycles in affected programs: 52746 -> 52026 (-1.37%)
helped: 6
HURT: 0
helped stats (abs) min: 120 max: 120 x̄: 120.00 x̃: 120
helped stats (rel) min: 1.16% max: 1.66% x̄: 1.41% x̃: 1.41%
95% mean confidence interval for cycles value: -120.00 -120.00
95% mean confidence interval for cycles %-change: -1.70% -1.12%
Cycles are helped.

This change has almost no effect right now.  However, removing this
patch (but leaving the patch "nir/algebraic: Replace a bcsel of a b2f
with a b2f(!(a || b))") after adding a patch that removes !(a < b) -> (a
>= b) optimizations (like
https://patchwork.freedesktop.org/patch/264787/) has the following
results on Skylake:

Skylake
total instructions in shared programs: 15071022 -> 15089710 (0.12%)
instructions in affected programs: 1022219 -> 1040907 (1.83%)
helped: 1
HURT: 3937
helped stats (abs) min: 41 max: 41 x̄: 41.00 x̃: 41
helped stats (rel) min: 1.01% max: 1.01% x̄: 1.01% x̃: 1.01%
HURT stats (abs)   min: 1 max: 256 x̄: 4.76 x̃: 4
HURT stats (rel)   min: 0.05% max: 11.18% x̄: 2.59% x̃: 2.60%
95% mean confidence interval for instructions value: 4.56 4.93
95% mean confidence interval for instructions %-change: 2.54% 2.64%
Instructions are HURT.

total cycles in shared programs: 369777134 -> 370092923 (0.09%)
cycles in affected programs: 17516573 -> 17832362 (1.80%)
helped: 115
HURT: 3624
helped stats (abs) min: 1 max: 1721 x̄: 81.18 x̃: 28
helped stats (rel) min: <.01% max: 10.74% x̄: 1.24% x̃: 0.65%
HURT stats (abs)   min: 1 max: 12640 x̄: 89.71 x̃: 54
HURT stats (rel)   min: <.01% max: 28.24% x̄: 4.72% x̃: 4.52%
95% mean confidence interval for cycles value: 75.21 93.71
95% mean confidence interval for cycles %-change: 4.43% 4.64%
Cycles are HURT.

total spills in shared programs: 9450 -> 9442 (-0.08%)
spills in affected programs: 166 -> 158 (-4.82%)
helped: 2
HURT: 0

total fills in shared programs: 21115 -> 21094 (-0.10%)
fills in affected programs: 438 -> 417 (-4.79%)
helped: 2
HURT: 0

LOST:   1
GAINED: 0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick
7725d60938 intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))
Since Boolean values are either -1 (true) or 0 (false), b2f(inot(a))
maps -1 => 0.0 and 0 => 1.0.  This is equivalent to 1.0 +
float(boolBitsToInt(a)).  On Intel GPUs, ADD is one of the few
instructions that can type-convert during write to destination, so we
can achieve this in a single instruction:

    add    g47F, g26D, 1D

v2: Fix swizzles.

v3: Fix typos in comments.  Noticed by Ken.

All Gen6+ platforms had similar results. (Skylake shown)
Skylake
total instructions in shared programs: 15185583 -> 15184683 (<.01%)
instructions in affected programs: 239389 -> 238489 (-0.38%)
helped: 899
HURT: 1
helped stats (abs) min: 1 max: 2 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.15% max: 1.85% x̄: 0.49% x̃: 0.44%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.09% max: 0.09% x̄: 0.09% x̃: 0.09%
95% mean confidence interval for instructions value: -1.01 -0.99
95% mean confidence interval for instructions %-change: -0.51% -0.48%
Instructions are helped.

total cycles in shared programs: 370964249 -> 370961508 (<.01%)
cycles in affected programs: 1487586 -> 1484845 (-0.18%)
helped: 420
HURT: 268
helped stats (abs) min: 1 max: 232 x̄: 22.41 x̃: 6
helped stats (rel) min: 0.05% max: 22.60% x̄: 1.30% x̃: 0.41%
HURT stats (abs)   min: 1 max: 230 x̄: 24.90 x̃: 10
HURT stats (rel)   min: <.01% max: 21.60% x̄: 1.45% x̃: 0.52%
95% mean confidence interval for cycles value: -7.61 -0.36
95% mean confidence interval for cycles %-change: -0.44% -0.02%
Cycles are helped.

No changes on Iron Lake or GM45.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick
cb3e21cd19 intel/fs: Use De Morgan's laws to avoid logical-not of a logic result on Gen8+
Instead of emitting ~(a & b), emit (~a | ~b) since logical-not of
operands is free on Gen8+.

v2: Fix swizzles.  Fix types for cmod propagation.

v3: Simplify logic for inverting source of inot(ixor(a, b)).  Suggested
by Ken.

Skylake and Broadwell had similar results. (Skylake shown)
Skylake
total instructions in shared programs: 15185593 -> 15185583 (<.01%)
instructions in affected programs: 5673 -> 5663 (-0.18%)
helped: 12
HURT: 1
helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1
helped stats (rel) min: 0.30% max: 5.88% x̄: 1.50% x̃: 0.70%
HURT stats (abs)   min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel)   min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
95% mean confidence interval for instructions value: -1.66 0.13
95% mean confidence interval for instructions %-change: -2.60% -0.15%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 370977726 -> 370964249 (<.01%)
cycles in affected programs: 869987 -> 856510 (-1.55%)
helped: 15
HURT: 2
helped stats (abs) min: 2 max: 6640 x̄: 902.20 x̃: 16
helped stats (rel) min: <.01% max: 4.92% x̄: 1.71% x̃: 1.53%
HURT stats (abs)   min: 14 max: 42 x̄: 28.00 x̃: 28
HURT stats (rel)   min: 1.08% max: 3.18% x̄: 2.13% x̃: 2.13%
95% mean confidence interval for cycles value: -1654.87 69.34
95% mean confidence interval for cycles %-change: -2.29% -0.23%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick
8eb36c9129 intel/fs: Emit logical-not of operands on Gen8+
On Gen8+ specifying negation of a logical operation such as AND actually
performs a logical-not.  Take advantage of this to generate fewer
instructions.

v2: Major rebase.  Use nir_src_as_alu_instr.  Fix swizzle handling.

No changes on any pre-Gen8 platform.

Skylake and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs: 15466902 -> 15466274 (<.01%)
instructions in affected programs: 1262953 -> 1262325 (-0.05%)
helped: 682
HURT: 4
helped stats (abs) min: 1 max: 5 x̄: 1.02 x̃: 1
helped stats (rel) min: 0.03% max: 2.40% x̄: 0.18% x̃: 0.04%
HURT stats (abs)   min: 1 max: 62 x̄: 17.50 x̃: 3
HURT stats (rel)   min: 0.03% max: 1.89% x̄: 0.53% x̃: 0.10%
95% mean confidence interval for instructions value: -1.10 -0.73
95% mean confidence interval for instructions %-change: -0.19% -0.15%
Instructions are helped.

total cycles in shared programs: 410996093 -> 410950440 (-0.01%)
cycles in affected programs: 144389048 -> 144343395 (-0.03%)
helped: 519
HURT: 51
helped stats (abs) min: 1 max: 1060 x̄: 104.46 x̃: 140
helped stats (rel) min: 0.01% max: 10.98% x̄: 0.34% x̃: 0.03%
HURT stats (abs)   min: 1 max: 4060 x̄: 167.90 x̃: 22
HURT stats (rel)   min: <.01% max: 8.20% x̄: 0.96% x̃: 0.25%
95% mean confidence interval for cycles value: -97.16 -63.02
95% mean confidence interval for cycles %-change: -0.32% -0.13%
Cycles are helped.

total spills in shared programs: 95311 -> 95329 (0.02%)
spills in affected programs: 881 -> 899 (2.04%)
helped: 0
HURT: 4

total fills in shared programs: 93629 -> 93634 (<.01%)
fills in affected programs: 794 -> 799 (0.63%)
helped: 1
HURT: 2

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick
06eaaf2de9 intel/fs: Refactor ALU source and destination handling to a separate function
Other places will need to do this soon to properly handle source
swizzles.  The patch looks a little odd, but the change is pretty
straight forward.  All of the swizzle and mask handling is moved out,
but the code for handling move instructions and vecN instructions
remains in nir_emit_alu.

I'm not terribly pleased with the "need_dest" parameter, but
get_nir_dest is (somewhat surprisingly) destructive.  I am open to
suggestions of alternatives.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick
fb3ca9109c intel/fs: Handle OR source modifiers in algebraic optimization
Found by inspection.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick
c9d5bd050c intel/fs: Relax type matching rules in cmod propagation from MOV instructions
To allow cmod propagation from a MOV in a sequence like:

    and(16)         g31<1>UD       g20<8,8,1>UD   g22<8,8,1>UD
    mov.nz.f0(16)   null<1>F       g31<8,8,1>D

A similar change to the vec4 backend had no effect.

Somewhere between c1ec582059 and 40fc4b5acd (1,094 commits) the
effectiveness of this patch diminished, and as of commit d7e0d47b9d
(nir: Add a bunch of b2[if] optimizations) this optimization no longer
has any effect on any platform.

A later patch "intel/fs: Use De Morgan's laws to avoid logical-not of a
logic result on Gen8+," generates some instruction sequences that
require this change in order for cmod propagation to make progress.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick
d2056ab993 intel/vec4: Emit constants for some ALU sources as immediate values
In some cases of flow control, the constant propagation is not able to
determine that the source of an instruction must be a constant value.
When we still have NIR SSA values, we can easily determine this.  Emit
the immediate value during code generation to possible avoid spurious
loads of constants into registers.

I wrote this patch to prevent a couple trivial regressions in vec4
shaders caused by "nir/algebraic: Replace i2b used by bcsel or
if-statement with comparison".  The final result was quite a bit better
than that...

No shader-db changes on any Gen8+ platform.

v2: Assert that we never get a negation source modifier on Gen8+.
Suggested by Ken.  This should never happen because we don't normally
use vec4 for Gen8+ (requires and environment variable to force it), and
there's no code to generate these negations.  Still, erring on the side
of caution is better.

Haswell
total instructions in shared programs: 13776218 -> 13764783 (-0.08%)
instructions in affected programs: 663931 -> 652496 (-1.72%)
helped: 3495
HURT: 1
helped stats (abs) min: 1 max: 30 x̄: 3.28 x̃: 2
helped stats (rel) min: 0.21% max: 10.00% x̄: 1.79% x̃: 1.49%
HURT stats (abs)   min: 24 max: 24 x̄: 24.00 x̃: 24
HURT stats (rel)   min: 12.24% max: 12.24% x̄: 12.24% x̃: 12.24%
95% mean confidence interval for instructions value: -3.39 -3.15
95% mean confidence interval for instructions %-change: -1.84% -1.75%
Instructions are helped.

total cycles in shared programs: 386818984 -> 386511910 (-0.08%)
cycles in affected programs: 20379636 -> 20072562 (-1.51%)
helped: 3052
HURT: 476
helped stats (abs) min: 2 max: 12516 x̄: 110.40 x̃: 6
helped stats (rel) min: 0.05% max: 24.68% x̄: 1.58% x̃: 0.69%
HURT stats (abs)   min: 2 max: 416 x̄: 62.76 x̃: 24
HURT stats (rel)   min: 0.10% max: 10.75% x̄: 4.03% x̃: 2.18%
95% mean confidence interval for cycles value: -115.57 -58.51
95% mean confidence interval for cycles %-change: -0.93% -0.73%
Cycles are helped.

total spills in shared programs: 100482 -> 100480 (<.01%)
spills in affected programs: 79 -> 77 (-2.53%)
helped: 3
HURT: 1

total fills in shared programs: 96883 -> 96877 (<.01%)
fills in affected programs: 85 -> 79 (-7.06%)
helped: 4
HURT: 0

Ivy Bridge
total instructions in shared programs: 12000562 -> 11990113 (-0.09%)
instructions in affected programs: 572581 -> 562132 (-1.82%)
helped: 3106
HURT: 0
helped stats (abs) min: 1 max: 30 x̄: 3.36 x̃: 2
helped stats (rel) min: 0.21% max: 10.00% x̄: 1.86% x̃: 1.49%
95% mean confidence interval for instructions value: -3.49 -3.23
95% mean confidence interval for instructions %-change: -1.91% -1.81%
Instructions are helped.

total cycles in shared programs: 180958504 -> 180664500 (-0.16%)
cycles in affected programs: 19991810 -> 19697806 (-1.47%)
helped: 2654
HURT: 486
helped stats (abs) min: 2 max: 12516 x̄: 121.61 x̃: 6
helped stats (rel) min: 0.05% max: 20.66% x̄: 1.48% x̃: 0.68%
HURT stats (abs)   min: 2 max: 396 x̄: 59.18 x̃: 24
HURT stats (rel)   min: 0.05% max: 9.62% x̄: 3.82% x̃: 2.16%
95% mean confidence interval for cycles value: -125.62 -61.64
95% mean confidence interval for cycles %-change: -0.76% -0.56%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10842336 -> 10835438 (-0.06%)
instructions in affected programs: 395340 -> 388442 (-1.74%)
helped: 1926
HURT: 0
helped stats (abs) min: 1 max: 22 x̄: 3.58 x̃: 2
helped stats (rel) min: 0.10% max: 9.68% x̄: 1.78% x̃: 1.42%
95% mean confidence interval for instructions value: -3.73 -3.43
95% mean confidence interval for instructions %-change: -1.84% -1.72%
Instructions are helped.

total cycles in shared programs: 154590074 -> 154569050 (-0.01%)
cycles in affected programs: 8159932 -> 8138908 (-0.26%)
helped: 1670
HURT: 228
helped stats (abs) min: 2 max: 260 x̄: 18.13 x̃: 6
helped stats (rel) min: 0.02% max: 8.70% x̄: 0.74% x̃: 0.28%
HURT stats (abs)   min: 2 max: 1798 x̄: 40.58 x̃: 14
HURT stats (rel)   min: 0.03% max: 12.97% x̄: 1.04% x̃: 0.31%
95% mean confidence interval for cycles value: -13.51 -8.64
95% mean confidence interval for cycles %-change: -0.60% -0.46%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8212357 -> 8206587 (-0.07%)
instructions in affected programs: 323664 -> 317894 (-1.78%)
helped: 1457
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 3.96 x̃: 3
helped stats (rel) min: 0.33% max: 11.49% x̄: 1.86% x̃: 1.44%
95% mean confidence interval for instructions value: -4.14 -3.78
95% mean confidence interval for instructions %-change: -1.93% -1.78%
Instructions are helped.

total cycles in shared programs: 187668016 -> 187657422 (<.01%)
cycles in affected programs: 14856234 -> 14845640 (-0.07%)
helped: 1372
HURT: 83
helped stats (abs) min: 2 max: 24 x̄: 7.92 x̃: 6
helped stats (rel) min: 0.02% max: 1.14% x̄: 0.12% x̃: 0.08%
HURT stats (abs)   min: 2 max: 14 x̄: 3.20 x̃: 2
HURT stats (rel)   min: 0.03% max: 0.60% x̄: 0.12% x̃: 0.12%
95% mean confidence interval for cycles value: -7.65 -6.91
95% mean confidence interval for cycles %-change: -0.11% -0.10%
Cycles are helped.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:41:46 -08:00
Jason Ekstrand
e8f863e718 intel/compiler: Re-prefix non-logical surface opcodes with VEC4
The scalar back-end uses SHADER_OPCODE_SEND for all surface messages so
we no longer need the non-logical opcodes there.  Prefix them VEC4 so
it's clear that they're only used by the vec4 back-end.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand
95ae400abc intel/schedule_instructions: Move some comments
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand
aeaba24fcb intel/compiler: Drop unused surface opcodes
The unused typed surface read/write support in the vec4 back-end has
been dropped and the fs back-end now uses SHADER_OPCODE_SEND for all
image and buffer ops.  There's no reason to keep these opcodes around
anymore.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand
a04c737215 intel/fs: Get rid of the IMAGE_SIZE opcode
Since switching to SHADER_OPCODE_SEND for image operations, we no longer
need the non-logical opcode.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand
10b7d14c31 intel/vec4: Drop dead code for handling typed surface messages
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand
9d437f9482 intel/fs: Drop the fs_surface_builder
All of the actual abstraction (except possibly setting size_written)
happens as part of the logical opcodes.  The only thing that the surface
builder is providing at this point is extra levels of functions to call
through.  I'm going to be adding bindless image support soon and all the
extra abstraction here is just getting in the way.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand
494a0543e6 intel/fs: Re-order logical surface arguments
It makes more sense to start at the surface then move on to the address
and then the data.  Also, this is a really good test of whether or not
we got all the places that use the sources by explicit integer number.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand
94f8fd9a0c intel/fs: Add an enum type for logical sampler inst sources
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00