Commit graph

763 commits

Author SHA1 Message Date
Jason Ekstrand
820d5e51b7 intel/compiler: Account for built-in uniforms in analyze_ubo_ranges
The original pass only looked for load_uniform intrinsics but there are
a number of other places that could end up loading a push constant.  One
obvious omission was images which always implicitly use a push constant.
Legacy VS clip planes also get pushed into the shader.  This fixes some
new Vulkan CTS tests that test random combinations of bindings and, in
particular, test lots of UBOs and images together.

Cc: mesa-stable@lists.freedesktop.org
Cc: Kenneth Graunke <kenneth@whitecape.org>
2018-07-23 15:28:17 -07:00
Caio Marcelo de Oliveira Filho
4a29ee1861 intel/compiler: fix -Wsign-compare warning
Explicitly convert to signed integer. Conversion is valid since is the
same (implicitly) used to initialize the loop. Avoids the warning:

../../src/intel/compiler/brw_fs.cpp: In member function ‘bool fs_visitor::lower_simd_width()’:
../../src/intel/compiler/brw_fs.cpp:5761:45: warning: comparison of integer expressions of different signedness: ‘int’ and ‘unsigned int’ [-Wsign-compare]
             split_inst.eot = inst->eot && i == n - 1;
                                           ~~^~~~~~~~

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2018-07-18 08:29:51 -07:00
Caio Marcelo de Oliveira Filho
7df5f62768 intel/compiler: silence -Wclass-memaccess warnings
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2018-07-18 08:29:51 -07:00
Jose Maria Casanova Crespo
62f37ee53d i965/fs: unspills shoudn't use grf127 as dest since Gen8+
At 232ed89802 "i965/fs: Register allocator
shoudn't use grf127 for sends dest" we didn't take into account the case
of SEND instructions that are not send_from_grf. But since Gen7+ although
the backend still uses MRFs internally for sends they are finally
assigned to a GRFs.

In the case of unspills the backend assigns directly as source its
destination because it is suppose to be available. So we always have a
source-destination overlap. If the reg_allocator assigns registers that
include the grf127 we fail the validation rule that affects Gen8+
"r127 must not be used for return address when there is a src and dest
overlap in send instruction."

So this patch activates the grf127_send_hack_node for Gen8+ and if we
have any register spilled we add interferences to the destination of
the unspill operations.

We also need to avoid that opt_bank_conflicts() optimization, that runs
after the register allocation, doesn't move things around, causing the
grf127 to be used in the condition we were avoiding.

Fixes piglit test tests/spec/arb_compute_shader/linker/bug-93840.shader_test
and some shader-db crashed because of the grf127 validation rule..

v2: make sure that opt_bank_conflicts() optimization doesn't change
the use of grf127. (Caio)

Found by Caio Marcelo de Oliveira Filho

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107193
Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127 for sends dest"
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Cc: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-12 18:02:26 +02:00
Francisco Jerez
18c086a9e6 intel/ir: Uncomment definition of several unused hardware opcodes.
There are a number of opcode_desc table entries for many of these
unused opcodes.  A symbolic opcode enum will be required in a future
commit in order to keep them in the opcode description tables.  The
alternative would be to remove the unused opcodes from the opcode
description tables.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
48d6fc5eb6 intel/fs: Initialize mlen for gen7 varying pull constant load messages.
This makes the message length available at the IR level, which should
save some guesswork in a future commit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
6643143f6e intel/eu: Assert that the instruction is send-like in brw_set_desc_ex().
Constructing a descriptor in-place as part of the immediate of an ALU
instruction is no longer supported.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
6f81e2b994 intel/eu: Get rid of the return value of brw_send_indirect_message().
The return value is not used anymore.  This allows simplifying the
code slightly, and in addition it should frustrate anybody's attempts
to continue using the obsolete piecemeal approach to construct a
message descriptor in combination with brw_send_indirect_message().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
b3cce4c130 intel/eu: Get rid of the return value of brw_send_indirect_surface_message().
All users of brw_send_indirect_surface_message() should be providing a
full descriptor immediate up front by now, this isn't necessary
anymore.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
95b5367149 intel/eu: Use descriptor constructors for dataport typed surface messages.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
94166cef40 intel/eu: Use descriptor constructors for dataport scattered byte surface messages.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
2a9605d610 intel/eu: Use descriptor constructors for dataport untyped surface messages.
v2: Use SET_BITS macro instead of left shift (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
8e707fc2af intel/eu: Provide single descriptor argument to brw_send_indirect_surface_message().
Instead of the current message_len, response_len and header_present
arguments.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
b10b4e7c45 intel/eu: Use descriptor constructors for pixel interpolator messages.
v2: Use SET_BITS macro instead of left shift (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
8fa4bc4676 intel/eu: Use descriptor constructors for dataport write messages.
v2: Use SET_BITS macro instead of left shift (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Francisco Jerez
2bac890bf5 intel/eu: Use descriptor constructors for dataport read messages.
v2: Use SET_BITS macro instead of left shift (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Francisco Jerez
27c211e30f intel/eu: Use descriptor constructors for sampler messages.
v2: Use SET_BITS macro instead of left shift (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Francisco Jerez
1c90ae5acc intel/eu: Provide desc immediate argument up front to brw_send_indirect_message().
The current approach of returning a setup instruction where additional
descriptor fields can be specified is still supported in order to keep
things working, but it will be removed later in this series.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Francisco Jerez
b382bdde1d TRIVIAL: intel/eu: Use a local devinfo variable in brw_shader_time_add().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Francisco Jerez
c3793d49e4 intel/eu: Use brw_set_desc() along with a helper to set common descriptor controls.
This replaces brw_set_message_descriptor() with the composition of
brw_set_desc() and a new inline helper function that packs the common
message descriptor controls into an integer.  The goal is to represent
all message descriptors as a 32-bit integer which is written at once
into the instruction, which is more flexible (SENDS anyone?), robust
(see d2eecf0b0b fixing an issue
ultimately caused by some bits of the extended message descriptor
being left undefined) and future-proof than the current approach of
specifying the individual descriptor fields directly into the
instruction.

This approach also seems more self-documenting, since it will allow
removing calls to functions with way too many arguments like
brw_set_*_message() and brw_send_indirect_message(), and instead
provide a single descriptor argument constructed from an appropriate
combination of brw_*_desc() helpers.

Note that because brw_set_message_descriptor() was (conditionally?)
overriding fields of the instruction which strictly speaking weren't
part of the message descriptor, this involves calling
brw_inst_set_sfid() and brw_inst_set_eot() in some cases in addition
to brw_set_desc().

v2: Use SET_BITS macro instead of left shift (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Francisco Jerez
20b962232b intel/eu: Define SET_BITS helper more easily reusable than SET_FIELD.
Allows to specify a bitfield based on its upper and lower bounds
instead of a symbolic field definition, kind of what the current
GET_BITS macro is to GET_FIELD.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Francisco Jerez
d0f589a55b intel/eu: Define helper to specify the descriptor immediates of a SEND instruction.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Francisco Jerez
f55884cad3 intel/eu: Add brw_inst.h helpers for the SEND(C) descriptor and extended descriptor.
This introduces helpers that can be used to specify or extract the
whole descriptor of a SEND message instruction at once.  Because the
the instruction encoding of these is rather awkward on some
generations using the generic brw_inst.h macros doesn't seem like an
option.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Iago Toral Quiroga
213491600a intel/compiler: emit actual barriers for working-group level barriers
Until now we have assumed that we could skip emitting these barriers
in the general case based on empirical testing and a few assumptions
detailed in a comment in the driver code, however, recent CTS tests
have showed that we actually need them to produce correct behavior.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 07:46:34 +02:00
Jose Maria Casanova Crespo
cd0afab99b i965/fs: Enable store_ssbo for 8-bit types.
v2: Update comment according to this patch. (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 00:14:50 +02:00
Jose Maria Casanova Crespo
11c904d0d3 intel/compiler: relax brw_eu_validate for byte raw movs
When the destination is a BYTE type allow raw movs
even if the stride is not exact multiple of destination
type and exec type, execution type is Word and its size is 2.

This restriction was only allowing stride==2 destinations
for 8-bit types.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 00:14:49 +02:00
Jose Maria Casanova Crespo
87fc9af3fc i965/fs: Enable conversions to 8-bit integers
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 00:14:49 +02:00
Jose Maria Casanova Crespo
030472c1f0 i965: Support for 8-bit base types in helper functions
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 00:14:49 +02:00
Jose Maria Casanova Crespo
232ed89802 i965/fs: Register allocator shoudn't use grf127 for sends dest
Since Gen8+ Intel PRM states that "r127 must not be used for return
address when there is a src and dest overlap in send instruction."

This patch implements this restriction creating new grf127_send_hack_node
at the register allocator. This node has a fixed assignation to grf127.

For vgrf that are used as destination of send messages we create node
interfereces with the grf127_send_hack_node. So the register allocator
will never assign to these vgrf a register that involves grf127.

If dispatch_width > 8 we don't create these interferences to the because
all instructions have node interferences between sources and destination.
That is enough to avoid the r127 restriction.

This fixes CTS tests that raised this issue as they were executed as SIMD8:

dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom

Shader-db results on Skylake:
   total instructions in shared programs: 7686798 -> 7686797 (<.01%)
   instructions in affected programs: 301 -> 300 (-0.33%)
   helped: 1
   HURT: 0

   total cycles in shared programs: 337092322 -> 337091919 (<.01%)
   cycles in affected programs: 22420415 -> 22420012 (<.01%)
   helped: 712
   HURT: 588

Shader-db results on Broadwell:

   total instructions in shared programs: 7658574 -> 7658625 (<.01%)
   instructions in affected programs: 19610 -> 19661 (0.26%)
   helped: 3
   HURT: 4

   total cycles in shared programs: 340694553 -> 340676378 (<.01%)
   cycles in affected programs: 24724915 -> 24706740 (-0.07%)
   helped: 998
   HURT: 916

   total spills in shared programs: 4300 -> 4311 (0.26%)
   spills in affected programs: 333 -> 344 (3.30%)
   helped: 1
   HURT: 3

   total fills in shared programs: 5370 -> 5378 (0.15%)
   fills in affected programs: 274 -> 282 (2.92%)
   helped: 1
   HURT: 3

v2: Avoid duplicating register classes without grf127. Let's use a node
    with a fixed assignation to grf127 and create interferences to send
    message vgrf destinations. (Eric Anholt)
v3: Update reference to CTS VK_KHR_8bit_storage failing tests.
    (Jose Maria Casanova)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
2018-07-10 00:14:49 +02:00
Jose Maria Casanova Crespo
0e47ecb29a intel/compiler: grf127 can not be dest when src and dest overlap in send
Implement at brw_eu_validate the restriction from Intel Broadwell PRM,
vol 07, section "Instruction Set Reference", subsection "EUISA
Instructions", Send Message (page 990):

"r127 must not be used for return address when there is a src and
dest overlap in send instruction."

v2: Style fixes (Matt Turner)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
2018-07-10 00:14:49 +02:00
Jose Maria Casanova Crespo
6706b421f0 intel/fs: use uint type for per_slot_offset at GS
This helps us to compact original instruction:

mul(8)  g3<1>D  g6<8,8,1>UD  0x00000006UD { align1 1Q };

So now we emit:

mul(8)  g3<1>UD g6<8,8,1>UD  0x00000006UD { align1 1Q compacted };

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-07-09 15:28:48 +02:00
Iago Toral Quiroga
81ca08e030 intel/compiler: remove unused function
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-07-09 13:21:48 +02:00
Ian Romanick
f8e54d02f7 intel/compiler: Relax mixed type restriction for saturating immediates
At the time of commit 7bc6e455e2 (i965: Add support for saturating
immediates.) we thought mixed type saturates would be impossible.  We
were only thinking about type converting moves from D to F, for
example.  However, type converting moves w/saturate from F to DF are
definitely possible.  This change minimally relaxes the restriction to
allow cases that I have been able trigger via piglit tests.

Fixes new piglit tests:
 - arb_gpu_shader_fp64/execution/built-in-functions/fs-sign-sat-neg-abs.shader_test
 - arb_gpu_shader_fp64/execution/built-in-functions/vs-sign-sat-neg-abs.shader_test

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-06 16:20:10 -07:00
Ian Romanick
9626ea497d i965/vec4: Properly handle sign(-abs(x))
This is achived by copying the sign(abs(x)) optimization from the FS
backend.

On Gen7 an earlier platforms, this fixes new piglit tests:

 - glsl-1.10/execution/vs-sign-neg-abs.shader_test
 - glsl-1.10/execution/vs-sign-sat-neg-abs.shader_test

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-06 16:20:07 -07:00
Ian Romanick
88bd37c010 i965/fs: Properly handle sign(-abs(x))
Fixes new piglit tests:

 - glsl-1.10/execution/fs-sign-neg-abs.shader_test
 - glsl-1.10/execution/fs-sign-sat-neg-abs.shader_test
 - glsl-1.10/execution/vs-sign-neg-abs.shader_test
 - glsl-1.10/execution/vs-sign-sat-neg-abs.shader_test

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-06 16:20:04 -07:00
Mathieu Bridon
0f7b18fa0d python: Use the print function
In Python 2, `print` was a statement, but it became a function in
Python 3.

Using print functions everywhere makes the script compatible with Python
versions >= 2.6, including Python 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2018-07-06 10:04:22 -07:00
Ian Romanick
965a06dbd7 i965/vec4: Make the vec4_visitor::nir_emit_instr default case unreachable
The bug fixed by the previous commit went undetected because extra
stderr messages are not flagged by the CI.  Copy the solution from
fs_visitor::nir_emit_instr and mark the default case unreachable.

An alternate solution is to delete the default case so that the compiler
will issue a warning.  That may require more work since there are other
(impossible) cases that exist.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-05 21:13:32 -07:00
Ian Romanick
a4d4787327 intel/compiler: More DCE after lowering
Some of the lowering passes, nir_lower_locals_to_regs for example, can
cause some previously live code to be dead.  This pass in particular
leaves a bunch of nir_instr_type_deref instructions floating around.
This causes shader-db runs on Gen5 through Haswell to spew tons of
messages like:

    VS instruction not yet implemented by NIR->vec4

UnrealEngine4/EffectsCaveDemo/239.shader_test is one shader that
generates these messages.  Cleaning up the dead code fixes that.

To verify, I did a shader-db before and after.  Even though all the
messages are gone, the results make my brain hurt. :(

Haswell
total cycles in shared programs: 411890163 -> 411891145 (<.01%)
cycles in affected programs: 57016 -> 57998 (1.72%)
helped: 3
HURT: 11
helped stats (abs) min: 2 max: 154 x̄: 96.67 x̃: 134
helped stats (rel) min: 0.08% max: 2.23% x̄: 1.42% x̃: 1.96%
HURT stats (abs)   min: 18 max: 686 x̄: 115.64 x̃: 20
HURT stats (rel)   min: 0.81% max: 7.12% x̄: 1.87% x̃: 0.93%
95% mean confidence interval for cycles value: -51.39 191.67
95% mean confidence interval for cycles %-change: -0.14% 2.46%
Inconclusive result (value mean confidence interval includes 0).

Ivy Bridge
total cycles in shared programs: 259114802 -> 259115032 (<.01%)
cycles in affected programs: 24034 -> 24264 (0.96%)
helped: 1
HURT: 9
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08%
HURT stats (abs)   min: 18 max: 48 x̄: 25.78 x̃: 20
HURT stats (rel)   min: 0.80% max: 1.94% x̄: 1.08% x̃: 0.80%
95% mean confidence interval for cycles value: 12.42 33.58
95% mean confidence interval for cycles %-change: 0.54% 1.38%
Cycles are HURT.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 5a02ffb733 nir: Rework lower_locals_to_regs to use deref instructions
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-05 21:13:21 -07:00
Neil Roberts
2d5ddbe960 i965: Fix output register sizes when variable ranges are interleaved
In 6f5abf3146 this code was fixed to calculate the maximum size of
an attribute in a seperate pass and then allocate the registers to
that size. However this wasn’t taking into account ranges that overlap
but don’t have the same starting location. For example:

layout(location = 0, component = 0) out float a[4];
layout(location = 2, component = 1) out float b[4];

Previously, if ‘a’ was processed first then it would allocate a
register of size 4 for location 0 and it wouldn’t allocate another
register for location 2 because it would already be covered by the
range of 0. Then if something tries to write to b[2] it would try to
write past the end of the register allocated for ‘a’ and it would hit
an assert.

This patch changes it to scan for any overlapping ranges that start
within each range to calculate the maximum extent and allocate that
instead.

Fixed Piglit’s arb_enhanced_layouts/execution/component-layout/
vs-fs-array-interleave-range.shader_test

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: 6f5abf3146 "i965: Fix output register sizes when multiple variables
       share a slot."
2018-07-04 10:57:51 +02:00
Ian Romanick
995d993710 i965/vec4: Don't cmod propagate from CMP to ADD if the writemask isn't compatible
Otherwise we can incorrectly cmod propagate in situations like

    add(8)          g10<1>.xD       g2<0>.xD        -16D
    ...
    cmp.ge.f0(8)    null<1>D        g2<0>.xD        16D
    ...
    (+f0) sel(8)    g21<1>.xyUD     g14<4>.xyyyUD   g18<4>.xyyyUD

Sadly, this change hurts quite a few shaders.

v2: Refactor writemask compatibility check into a separate function.
Suggested by Caio.

Ivy Bridge and Haswell had similar results. (Haswell shown)
total instructions in shared programs: 12968489 -> 12968738 (<.01%)
instructions in affected programs: 60679 -> 60928 (0.41%)
helped: 0
HURT: 249
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.22% max: 0.81% x̄: 0.46% x̃: 0.44%
95% mean confidence interval for instructions value: 1.00 1.00
95% mean confidence interval for instructions %-change: 0.44% 0.48%
Instructions are HURT.

total cycles in shared programs: 409171965 -> 409172317 (<.01%)
cycles in affected programs: 260056 -> 260408 (0.14%)
helped: 0
HURT: 176
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.04% max: 0.34% x̄: 0.17% x̃: 0.17%
95% mean confidence interval for cycles value: 2.00 2.00
95% mean confidence interval for cycles %-change: 0.16% 0.18%
Cycles are HURT.

Sandy Bridge
total instructions in shared programs: 10423577 -> 10423753 (<.01%)
instructions in affected programs: 40667 -> 40843 (0.43%)
helped: 0
HURT: 176
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.29% max: 0.79% x̄: 0.48% x̃: 0.42%
95% mean confidence interval for instructions value: 1.00 1.00
95% mean confidence interval for instructions %-change: 0.46% 0.51%
Instructions are HURT.

total cycles in shared programs: 146097503 -> 146097855 (<.01%)
cycles in affected programs: 503990 -> 504342 (0.07%)
helped: 0
HURT: 176
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.02% max: 0.36% x̄: 0.12% x̃: 0.11%
95% mean confidence interval for cycles value: 2.00 2.00
95% mean confidence interval for cycles %-change: 0.11% 0.13%
Cycles are HURT.

No changes on any other platforms.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: cd635d149b i965/vec4: Propagate conditional modifiers from compares to adds
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-02 19:19:16 -07:00
Ian Romanick
fb6dc8e894 intel/compiler: Silence unused parameter warnings brw_nir.c
src/intel/compiler/brw_nir.c: In function ‘brw_nir_lower_vue_outputs’:
src/intel/compiler/brw_nir.c:464:32: warning: unused parameter ‘is_scalar’ [-Wunused-parameter]
                           bool is_scalar)
                                ^~~~~~~~~
src/intel/compiler/brw_nir.c: In function ‘lower_bit_size_callback’:
src/intel/compiler/brw_nir.c:610:57: warning: unused parameter ‘data’ [-Wunused-parameter]
 lower_bit_size_callback(const nir_alu_instr *alu, void *data)
                                                         ^~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-02 16:17:19 -07:00
Jason Ekstrand
06412bfc98 anv,intel: Enable nir_opt_large_constants for Vulkan
According to RenderDoc, this shaves 99.6% of the run time off of the
ambient occlusion pass in Skyrim Special Edition when running under DXVK
and shaves 92% off the runtime for a reasonably representative frame.
When running the actual game, Skyrim goes from being a slide-show to a
very stable and playable framerate on my SKL GT4e machine.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-02 12:09:50 -07:00
Francisco Jerez
c2c803be7b intel/fs: Build 32-wide FS shaders.
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-28 13:25:21 -07:00
Jason Ekstrand
d5e028a57b intel/fs: Add fields to wm_prog_data for SIMD32 dispatch
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
bcbc7d3a17 intel/fs: Fix nir_intrinsic_load_helper_invocation for SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
7144247c2c intel/fs: Fix fs_builder::sample_mask_reg() for 32-wide FS dispatch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
37c1df28c9 intel/fs: Fix Gen6+ interpolation setup for SIMD32
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
e208bc3bb7 intel/fs: Get rid of MOV_DISPATCH_TO_FLAGS
We can just emit the MOV in the two places where we use this.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
5e3028d826 intel/fs: Emit MOV_DISPATCH_TO_FLAGS once for the centroid workaround
There's no reason for us to emit it a pile of times and then have a
whole pass to clean it up.  Just emit it once like we really want.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
40fe108e2b intel/fs: Generalize the unlit centroid workaround
This generalizes the unlit centroid workaround so it's less code and now
supports SIMD32.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00