Commit graph

11 commits

Author SHA1 Message Date
Jason Ekstrand
b4f0d062cd intel/fs: Do the grf127 hack on SIMD8 instructions in SIMD16 mode
Previously, we only applied the fix to shaders with a dispatch mode of
SIMD8 but the code it relies on for SIMD16 mode only applies to SIMD16
instructions.  If you have a SIMD8 instruction in a SIMD16 shader,
neither would trigger and the restriction could still be hit.

Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127..."
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-01 16:11:00 -06:00
Jason Ekstrand
014edff0d2 intel/fs: Add interference between SENDS sources
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand
7f1cf046cd intel/fs: Add a generic SEND opcode
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Matt Turner
7e4e9da90d intel/compiler: Prevent warnings in the following patch
The next patch replaces an unsigned bitfield with a plain unsigned,
which triggers gcc to begin warning on signed/unsigned comparisons.

Keeping this patch separate from the actual move allows bisectablity and
generates no additional warnings temporarily.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-01-09 16:42:41 -08:00
Iago Toral Quiroga
6c418dfa42 intel/compiler: fix node interference of simd16 instructions
SIMD16 instructions need to have additional interferences to prevent
source / destination hazards when the source and destination registers
are off by one register.

While we already have code to handle this, it was only running for SIMD16
dispatches, however, we can have SIDM16 instructions in a SIMD8 dispatch.
An example of this are pull constant loads since commit b56fa830c6,
but there are more cases.

This fixes a number of CTS test failures found in work-in-progress
tests that were hitting this situation for 16-wide pull constants
in a SIMD8 program.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-11-09 08:22:08 +01:00
Jose Maria Casanova Crespo
62f37ee53d i965/fs: unspills shoudn't use grf127 as dest since Gen8+
At 232ed89802 "i965/fs: Register allocator
shoudn't use grf127 for sends dest" we didn't take into account the case
of SEND instructions that are not send_from_grf. But since Gen7+ although
the backend still uses MRFs internally for sends they are finally
assigned to a GRFs.

In the case of unspills the backend assigns directly as source its
destination because it is suppose to be available. So we always have a
source-destination overlap. If the reg_allocator assigns registers that
include the grf127 we fail the validation rule that affects Gen8+
"r127 must not be used for return address when there is a src and dest
overlap in send instruction."

So this patch activates the grf127_send_hack_node for Gen8+ and if we
have any register spilled we add interferences to the destination of
the unspill operations.

We also need to avoid that opt_bank_conflicts() optimization, that runs
after the register allocation, doesn't move things around, causing the
grf127 to be used in the condition we were avoiding.

Fixes piglit test tests/spec/arb_compute_shader/linker/bug-93840.shader_test
and some shader-db crashed because of the grf127 validation rule..

v2: make sure that opt_bank_conflicts() optimization doesn't change
the use of grf127. (Caio)

Found by Caio Marcelo de Oliveira Filho

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107193
Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127 for sends dest"
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Cc: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-12 18:02:26 +02:00
Jose Maria Casanova Crespo
232ed89802 i965/fs: Register allocator shoudn't use grf127 for sends dest
Since Gen8+ Intel PRM states that "r127 must not be used for return
address when there is a src and dest overlap in send instruction."

This patch implements this restriction creating new grf127_send_hack_node
at the register allocator. This node has a fixed assignation to grf127.

For vgrf that are used as destination of send messages we create node
interfereces with the grf127_send_hack_node. So the register allocator
will never assign to these vgrf a register that involves grf127.

If dispatch_width > 8 we don't create these interferences to the because
all instructions have node interferences between sources and destination.
That is enough to avoid the r127 restriction.

This fixes CTS tests that raised this issue as they were executed as SIMD8:

dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom

Shader-db results on Skylake:
   total instructions in shared programs: 7686798 -> 7686797 (<.01%)
   instructions in affected programs: 301 -> 300 (-0.33%)
   helped: 1
   HURT: 0

   total cycles in shared programs: 337092322 -> 337091919 (<.01%)
   cycles in affected programs: 22420415 -> 22420012 (<.01%)
   helped: 712
   HURT: 588

Shader-db results on Broadwell:

   total instructions in shared programs: 7658574 -> 7658625 (<.01%)
   instructions in affected programs: 19610 -> 19661 (0.26%)
   helped: 3
   HURT: 4

   total cycles in shared programs: 340694553 -> 340676378 (<.01%)
   cycles in affected programs: 24724915 -> 24706740 (-0.07%)
   helped: 998
   HURT: 916

   total spills in shared programs: 4300 -> 4311 (0.26%)
   spills in affected programs: 333 -> 344 (3.30%)
   helped: 1
   HURT: 3

   total fills in shared programs: 5370 -> 5378 (0.15%)
   fills in affected programs: 274 -> 282 (2.92%)
   helped: 1
   HURT: 3

v2: Avoid duplicating register classes without grf127. Let's use a node
    with a fixed assignation to grf127 and create interferences to send
    message vgrf destinations. (Eric Anholt)
v3: Update reference to CTS VK_KHR_8bit_storage failing tests.
    (Jose Maria Casanova)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
2018-07-10 00:14:49 +02:00
Francisco Jerez
58324389be intel/fs: Take into account amount of data read in spilling cost heuristic.
Until now the spilling cost calculation was neglecting the amount of
data read from the register during the spilling cost calculation.
This caused it to make suboptimal decisions in some cases leading to
higher memory bandwidth usage than necessary.

Improves Unigine Heaven performance by ~4% on BDW, reversing an
unintended FPS regression from my previous commit
147e71242c with n=12 and statistical
significance 5%.  In addition SynMark2 OglCSDof performance is
improved by an additional ~5% on SKL, and a Kerbal Space Program
apitrace around the Moho planet I can provide on request improves by
~20%.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-24 11:01:40 -07:00
Francisco Jerez
ecc19e12dc intel/fs: Use regs_written() in spilling cost heuristic for improved accuracy.
This is what we use later on to compute the number of registers that
will actually get spilled to memory, so it's more likely to match
reality than the current open-coded approximation.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-24 10:59:56 -07:00
Francisco Jerez
147e71242c i965/fs: Take into account lower frequency of conditional blocks in spilling cost heuristic.
The individual branches of an if/else/endif construct will be executed
some unknown number of times between 0 and 1 relative to the parent
block.  Use some factor in between as weight while approximating the
cost of spill/fill instructions within a conditional if-else branch.
This favors spilling registers used within conditional branches which
are likely to be executed less frequently than registers used at the
top level.

Improves the framerate of the SynMark2 OglCSDof benchmark by ~1.9x on
my SKL GT4e.  Should have a comparable effect on other platforms.  No
significant regressions.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-11 15:28:54 -07:00
Jason Ekstrand
700bebb958 i965: Move the back-end compiler to src/intel/compiler
Mostly a dummy git mv with a couple of noticable parts:
 - With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
 - Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
 - brw_util.[ch] is not really compiler specific, so it's moved to i965.

v2:
 - move brw_eu_defines.h instead of brw_defines.h
 - remove no-longer applicable includes
 - add missing vulkan/ prefix in the Android build (thanks Tapani)

v3:
 - don't list brw_defines.h in src/intel/Makefile.sources (Jason)
 - rebase on top of the oa patches

[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00
Renamed from src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp (Browse further)