Commit graph

1147 commits

Author SHA1 Message Date
Italo Nicola
59623f211b intel/compiler: remove old comment
This comment was correct some time ago, but since commit
d3c10ad427, it isn't true anymore.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-11-18 10:20:34 -08:00
Danylo Piliaiev
0904ee0c60 intel/fs: Do not lower large local arrays to scratch on gen7
On gen7 and earlier the scratch space size is limited to 12kB.
By enabling this optimization we may easily exceed this limit
without having any fallback.

arb_compute_shader/linker/bug-93840.shader_test crashes with
this lowering on IVB due to exceeding scratch size limit.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2092
Fixes: 69244fc7
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-14 20:08:30 +00:00
Paulo Zanoni
eb6352162d intel/compiler: fix nir_op_{i,u}*32 on ICL
On ICL we have the src1 restriction which is applied through
fix_byte_src() and potentially changes the type of the operands from 8
to 32 bits. When this change happens, we fall into the "else if
(bit_size < 32)" case and miscompute src_type because it takes into
consideration bit_size (8) instead of the adjusted size of temp_op
(32). This results in the shader reading unused memory, giving us
mostly failures, but occasional passes due to whatever was already in
the registers we were reading.

This commit fixes a lot of dEQP subgroup i8vec2 tests on ICL, such as:
    dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2

This can also be verified by simply changing fix_byte_src() to apply
on all platforms.

Fixes: 5847de6e9a ("intel/compiler: don't use byte operands for src1 on ICL")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-11-13 22:13:52 +00:00
Jason Ekstrand
69244fc72a intel/fs: Lower large local arrays to scratch
Shader-db results on Kaby Lake:

    total instructions in shared programs: 14929212 -> 14880028 (-0.33%)
    instructions in affected programs: 72428 -> 23244 (-67.91%)
    helped: 6
    HURT: 2
    helped stats (abs) min: 2165 max: 15981 x̄: 8590.00 x̃: 7624
    helped stats (rel) min: 56.06% max: 74.52% x̄: 67.55% x̃: 72.08%
    HURT stats (abs)   min: 1178 max: 1178 x̄: 1178.00 x̃: 1178
    HURT stats (rel)   min: 350.60% max: 361.35% x̄: 355.97% x̃: 355.97%
    95% mean confidence interval for instructions value: -11947.03 -348.97
    95% mean confidence interval for instructions %-change: -125.72% 202.37%
    Inconclusive result (%-change mean confidence interval includes 0).

    total cycles in shared programs: 368585300 -> 342557344 (-7.06%)
    cycles in affected programs: 28144921 -> 2116965 (-92.48%)
    helped: 6
    HURT: 2
    helped stats (abs) min: 1404978 max: 7766106 x̄: 4353922.00 x̃: 3890682
    helped stats (rel) min: 82.01% max: 95.57% x̄: 89.95% x̃: 92.28%
    HURT stats (abs)   min: 47778 max: 47798 x̄: 47788.00 x̃: 47788
    HURT stats (rel)   min: 278.20% max: 282.98% x̄: 280.59% x̃: 280.59%
    95% mean confidence interval for cycles value: -5900438.73 -606550.27
    95% mean confidence interval for cycles %-change: -140.79% 146.16%
    Inconclusive result (%-change mean confidence interval includes 0).

    total spills in shared programs: 9243 -> 8901 (-3.70%)
    spills in affected programs: 2718 -> 2376 (-12.58%)
    helped: 4
    HURT: 4

    total fills in shared programs: 21831 -> 10141 (-53.55%)
    fills in affected programs: 11804 -> 114 (-99.03%)
    helped: 6
    HURT: 2

    total sends in shared programs: 815912 -> 815912 (0.00%)
    sends in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    LOST:   1
    GAINED: 3

The helped shaders are all compute shaders in Aztec Ruins.  There is
also a compute shader in synmark2 OglCSDof that's helped but it doesn't
show up in above shader-db results because it went from SIMD8 to SIMD16.
That shader improves enough to yield an 15-20% performance boost to the
benchmark as a whole on my KBL laptop.  The hurt shaders are a couple
shaders in Kerbal Space Program and a couple in Aztec Ruins.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-11-11 17:17:02 +00:00
Jason Ekstrand
53bfcdeecf intel/fs: Implement the new load/store_scratch intrinsics
This commit fills in a number of different pieces:

 1. We add support to brw_nir_lower_mem_access_bit_sizes to handle the
    new intrinsics.  This involves simple plumbing work as well as a
    tiny bit of extra logic to always scalarize scratch intrinsics

 2. Add code to brw_fs_nir.cpp to turn nir_load/store_scratch intrinsics
    into byte/dword scattered read/write messages which use the A32
    stateless model.

 3. Add code to lower_surface_logical_send to handle dword scattered
    messages and the A32 stateless model.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-11-11 17:17:02 +00:00
Jason Ekstrand
e2297699de intel/nir: Plumb devinfo through lower_mem_access_bit_sizes
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-11-11 17:17:02 +00:00
Jason Ekstrand
1dff48af05 intel/fs: refactor surface header setup
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-11-11 17:17:02 +00:00
Jason Ekstrand
a0999bc049 intel/fs: Add DWord scattered read/write opcodes
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-11-11 17:17:02 +00:00
Jason Ekstrand
83f04d80b0 intel/nir: Use nir_extract_bits in lower_mem_access_bit_sizes
The new helper solves most of the annoying problems with data wrangling
in brw_nir_lower_mem_access_bit_sizes.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-11-11 17:17:02 +00:00
Paulo Zanoni
b57383a944 intel/compiler: remove the operand restriction for src1 on GLK
Commit 5847de6e9a implemented a restriction that applies to ICL, but
wrongly marked it as also applying to GLK. Reviewers or MR !1125
pointed this, and the commit history shows removal of GLK to parts of
the patch, but it turns there was still a left-over GLK check in the
code.

This code was breaking some of the i8vec2 tests on GLK, for example:
  dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2

Removing the GLK check solves the issue for GLK. I don't see a reason
on why implementing this restriction would actually break GLK, so
there's still more to investigate here since this bug may be affecting
ICL+, but let's apply the real GLK fix while we analyze and discuss
the other possible issues.

Fixes: 5847de6e9a ("intel/compiler: don't use byte operands for src1
on ICL")
BSpec: 3017
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-11-05 00:08:34 +00:00
Ian Romanick
7b3f38ef69 intel/compiler: Report the number of non-spill/fill SEND messages on vec4 too
This make shader-db's report.py work on Haswell and earlier platforms.
The problem is that the script would detect the "sends" output for
scalar shaders and expect in in vec4 shaders too.  When it didn't find
it, the script would fail with:

    Traceback (most recent call last):
      File "./report.py", line 351, in <module>
        main()
      File "./report.py", line 182, in main
        before_count = before[p][m]
    KeyError: 'sends'

Fixes: f192741ddd ("intel/compiler: Report the number of non-spill/fill SEND messages")

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-30 21:27:03 -07:00
Jordan Justen
2b186264cc
intel/eu/validate/gen12: Add TGL to eu_validate tests.
These reworks were combined into this patch:

 * Matt Turner: i965: Disable NoDDChk/NoDDClr test on Gen12+
 * Francisco Jerez: intel/eu/validate/gen12: Disable
   qword_low_power_no_depctrl eu_validate test.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-30 14:08:51 -07:00
Matt Turner
12d3b11908 intel/compiler: Add instruction compaction support on Gen12
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-30 11:11:50 -07:00
Matt Turner
c8fbc8823f intel/compiler: Make separate src0/src1 index tables
TGL uses different data (and even a different format!) for each source.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-30 11:11:50 -07:00
Matt Turner
cde73625f8 intel/compiler: Inline get_src_index()
TGL will have separate tables for src0 and src1, so the shared function
will no longer make sense.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-30 11:11:50 -07:00
Matt Turner
d0eff8a539 intel/compiler: Restructure instruction compaction in preparation for Gen12
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-30 11:11:50 -07:00
Matt Turner
ded9fb2b18 intel/compiler: Remove unreachable() from brw_reg_type.c
The EU compaction unit test fuzzes the compaction code by flipping bits.
We use a simple skip_bits() function with a list of reserved bits to
ignore, but for more complex cases like invalid combinations of register
file:type, we need either machinery to check validity or for these
functions to simply inform us whether a combination was valid.

enum brw_reg_type a 4-bit field in brw_reg, so rather than expanding it
with an "INVALID" value, just return -1 and let the caller check for
that.

Scott suggested redefining unreachable() within the unit test to
longjmp() which would allow driver code like this to still use it and
allow the test to handle expected failures like this. If that plan works
out, I plan to revert this.
2019-10-30 11:11:50 -07:00
Jason Ekstrand
24c0545b2d intel/vec4: Set brw_stage_prog_data::has_ubo_pull
In 0e4a75f917, Ken added a flag brw_stage_prog_data which indicates
whether any UBO pulls ever occur.  Unfortunately, he neglected to set
the bit in the vec4 back-end.  This was fine at the time because the
optimization was intended for iris which does not support gen7 and using
the vec4 back-end on Gen8+ requires an environment variable.  We want to
use this in Vulkan which does support Gen7 so we want the information
from the vec4 back-end as well as scalar.

Fixes: 0e4a75f917 "intel/compiler: Record whether any pull constant..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-30 16:05:57 +00:00
Timothy Arceri
7f106a2b5d util: rename list_empty() to list_is_empty()
This makes it clear that it's a boolean test and not an action
(eg. "empty the list").

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-10-28 11:24:38 +00:00
Danylo Piliaiev
12a8f2616a intel/compiler: Fix C++ one definition rule violations
When building with "-flto" brw::block_data definitions
were colliding.

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-10-28 12:02:40 +02:00
Caio Marcelo de Oliveira Filho
e142061399 intel/fs: Implement scoped_memory_barrier
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-10-24 11:39:56 -07:00
Michel Dänzer
2b1b56cb3a intel/fs: Check for NULL key in fs_visitor constructor
Flagged by UBSan:

../src/intel/compiler/brw_fs_visitor.cpp:986:20: runtime error: member access within null pointer of type 'const struct brw_base_prog_key'
    #0 0x559fadb48556 in fs_visitor::init() ../src/intel/compiler/brw_fs_visitor.cpp:986
    #1 0x559fadb46db3 in fs_visitor::fs_visitor(brw_compiler const*, void*, void*, brw_base_prog_key const*, brw_stage_prog_data*, nir_shader const*, unsigned int, int, brw_vue_map const*) ../src/intel/compiler/brw_fs_visitor.cpp:962
    #2 0x559fad9c7cd8 in saturate_propagation_fs_visitor::saturate_propagation_fs_visitor(brw_compiler*, brw_wm_prog_data*, nir_shader*) (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/fs_saturate_propagation+0x61bcd8)
    #3 0x559fad9960a1 in saturate_propagation_test::SetUp() ../src/intel/compiler/test_fs_saturate_propagation.cpp:65
    #4 0x559fadd7a32d in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
    #5 0x559fadd65c3b in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
    #6 0x559fadd0af75 in testing::Test::Run() ../src/gtest/src/gtest.cc:2470
    #7 0x559fadd0d8a4 in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
    #8 0x559fadd10032 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
    #9 0x559fadd2ba0c in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
    #10 0x559fadd7df46 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
    #11 0x559fadd69613 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
    #12 0x559fadd2302e in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
    #13 0x559fadda2d61 in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
    #14 0x559fadda2c21 in main ../src/gtest/src/gtest_main.cc:37
    #15 0x7fe8f6748bba in __libc_start_main ../csu/libc-start.c:308
    #16 0x559fad9950f9 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/fs_saturate_propagation+0x5e90f9)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
2019-10-24 16:20:04 +02:00
Michel Dänzer
41623be20e intel/compiler: Cast to target type before shifting left
Otherwise a smaller type may be promoted to int, which can hit undefined
behaviour:

../src/intel/compiler/brw_packed_float.c:66:17: runtime error: left shift of 128 by 24 places cannot be represented in type 'int'
    #0 0x5604a03969aa in brw_vf_to_float ../src/intel/compiler/brw_packed_float.c:66
    #1 0x5604a0391305 in vf_float_conversion_test_test_vf_to_float_Test::TestBody() ../src/intel/compiler/test_vf_float_conversions.cpp:70
    #2 0x5604a041a323 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
    #3 0x5604a0405c31 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
    #4 0x5604a03ab03b in testing::Test::Run() ../src/gtest/src/gtest.cc:2474
    #5 0x5604a03ad714 in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
    #6 0x5604a03afea2 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
    #7 0x5604a03cb87c in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
    #8 0x5604a041df3c in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
    #9 0x5604a0409609 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
    #10 0x5604a03c2e9e in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
    #11 0x5604a0442d57 in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
    #12 0x5604a0442c17 in main ../src/gtest/src/gtest_main.cc:37
    #13 0x7f9a1983dbba in __libc_start_main ../csu/libc-start.c:308
    #14 0x5604a0390d89 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/vf_float_conversions+0x8dd89)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
2019-10-24 16:19:23 +02:00
Michel Dänzer
59b72bdfb4 intel/compiler: Don't left-shift by >= the number of bits of the type
To avoid it, use the modulo of the number of bits in the value being
shifted, which is presumably what ended up happening on x86.

Flagged by UBSan:

../src/intel/compiler/brw_eu_validate.c:974:33: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int'
    #0 0x561abb612ab3 in general_restrictions_on_region_parameters ../src/intel/compiler/brw_eu_validate.c:974
    #1 0x561abb617574 in brw_validate_instructions ../src/intel/compiler/brw_eu_validate.c:1851
    #2 0x561abb53bd31 in validate ../src/intel/compiler/test_eu_validate.cpp:106
    #3 0x561abb555369 in validation_test_source_cannot_span_more_than_2_registers_Test::TestBody() ../src/intel/compiler/test_eu_validate.cpp:486
    #4 0x561abb742651 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
    #5 0x561abb72e64d in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
    #6 0x561abb6d5451 in testing::Test::Run() ../src/gtest/src/gtest.cc:2474
    #7 0x561abb6d7b2a in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
    #8 0x561abb6da2b8 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
    #9 0x561abb6f5c92 in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
    #10 0x561abb74626a in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
    #11 0x561abb732025 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
    #12 0x561abb6ed2b4 in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
    #13 0x561abb768b3b in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
    #14 0x561abb7689fb in main ../src/gtest/src/gtest_main.cc:37
    #15 0x7f525e5a9bba in __libc_start_main ../csu/libc-start.c:308
    #16 0x561abb538ed9 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/eu_validate+0x1b8ed9)

Reviewed-by: Adam Jackson <ajax@redhat.com>
2019-10-24 16:16:49 +02:00
Sagar Ghuge
97e6d34e66 intel/compiler: Refactor disassembly of sources in 3src instruction
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-10-21 20:32:43 -07:00
Sagar Ghuge
18b28b5654 intel/compiler: Don't move immediate in register
On Gen12, we support mixed mode HF/F operands, and also 3 source
instruction supports immediate value support, so keep immediate as it
is, if it fits properly in 16 bit field.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-10-21 20:32:43 -07:00
Sagar Ghuge
bf943bdf24 intel/compiler: Set bits according to source file
On Gen >= 12, if src0 or src2 holds immediate value, we need set
src[0/2]_is_imm bits instead of register file.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-10-21 20:32:43 -07:00
Sagar Ghuge
c018c5a339 intel/compiler: Add Immediate support for 3 source instruction
On Gen >= 10, Either src0 or src2 can use 16-bit immediate value, but
not both.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-10-21 20:32:43 -07:00
Sagar Ghuge
f729ecefef intel/compiler: Remove emit_alpha_to_coverage workaround from backend
Remove emit_alpha_to_coverage workaround from backend compiler and start
using ported workaround from NIR.

v2: Copy comment from brw_fs_visitor (Caio Marcelo de Oliveira Filho)

Fixes piglit test on HSW:
- arb_sample_shading-builtin-gl-sample-mask-mrt-alpha-to-coverage-combinations

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-21 11:27:29 -07:00
Sagar Ghuge
7ecfbd4f6d nir: Add alpha_to_coverage lowering pass
Importing this pass from fs_visitor::emit_alpha_to_coverage_workaround()
in intel/compiler.

v2 (Caio Marcelo de Oliveira Filho):
- Track store output and sample mask instruction
- Nest math insturction for more readability
- Bail out early if no gl_SampleMask

v3: (Caio Marcelo de Oliveira Filho):
- Do math instructions after instruction block
- Restructure code
- Move pass under src/intel/compiler

v4: (Caio Marcelo de Oliveira Filho):
- Organize dither mask calculation

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-21 11:27:29 -07:00
Kenneth Graunke
f192741ddd intel/compiler: Report the number of non-spill/fill SEND messages
This can be useful to measure whether memory access optimizations are
having the desired effect.  For example, we might see a reduction in
image loads/stores, or constant buffer loads.  We can already see this
in cycle estimates to some extent, but this is a more direct approach,
minus a lot of the noise of random scheduler shuffling.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-10-17 20:44:00 -07:00
Ian Romanick
92252219d3 intel/vec4: Don't try both sources as immediates for DPH
DPH isn't actually commutative, so this doesn't work.  If the immediate
in src0 would be a VF candidate, we could do better. *shrug*

No shader-db changes on any Intel platform.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: b04beaf41d ("intel/vec4: Try both sources as candidates for being immediates")
2019-10-17 15:07:01 -07:00
Caio Marcelo de Oliveira Filho
c847bfaaf5 intel/fs/gen12: Add tests for scoreboard pass
Tests the combinations of cases of RAW, WAW and WAR hazards involving
both inorder and outoforder instructions.  Also tests that
dependencies combine and propagate correctly through control
flow (loops and conditionals).

v2: Add an extra test illustrating that the non-logical CFG edge
    between then-block and else-block is being taking into
    account.  (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2019-10-17 10:02:35 -07:00
Kenneth Graunke
44754279ac intel/fs/gen12: Use TCS 8_PATCH mode.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2019-10-11 12:24:16 -07:00
Jason Ekstrand
c92fb60007 intel/fs/gen12: Implement gl_FrontFacing on gen12+.
The bit moved on gen12 in order to prepare for dual-SIMD8 dispatch.
This implementation isn't an entirely complete as it only works on SIMD8
and SIMD16 and not dual-SIMD8.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
ceb123befa intel/fs/gen11+: Fix CS_OPCODE_CS_TERMINATE codegen.
Apparently the ts_request_type and ts_resource_select thread spawner
message descriptor bits were removed from the hardware at least since
ICL.  Drop them in order to avoid assertion failures on Gen12+
platforms which don't have any encoding for this.  On Gen9+ these are
probably just ignored by the hardware, so this is unlikely to have had
any functional implications prior to Gen12.

v2: Mark TS message fields as non-existing in brw_inst.h on ICL. (Caio)

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-11 12:24:16 -07:00
Francisco Jerez
a5efb0eae8 intel/fs/gen12: Fix barrier codegen.
The WAIT instruction has been removed, but SYNC.bar can be used
instead to wait for a notification on n0.0.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
6b52f81395 intel/eu: Don't set notify descriptor field of gateway barrier message.
Apparently this field was removed on SKL, and according to the
hardware docs for previous platforms "This field is only valid for a
ForwardMsg message. It is ignored for other messages. The BarrierMsg
message always increments the N0 notification counter".

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
b0e69d115e intel/ir/gen12: Update assert in brw_stage_has_packed_dispatch().
Confirmed no regressions after a full Piglit run on TGL with the
brw_fs_test_dispatch_packing() test enabled.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Jason Ekstrand
ca7b6fd392 intel/eu/validate/gen12: Don't blow up on indirect src0.
They look like a NULL source if you don't look at the address mode.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
ab5aa01689 intel/eu/validate/gen12: Validation fixes for SEND instruction.
The following fix-up by Jordan Justen is squashed in:

 intel/eu/validate: gen12 send instruction doesn't have a dst type field

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
a81f9b5e3e intel/eu/validate/gen12: Fix validation of SYNC instruction.
src0 will typically be null for this instruction.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
45768e6b3c intel/eu/validate/gen12: Implement integer multiply restrictions in EU validator.
Due to hardware bug filed as HSDES#1604601757.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-11 12:24:16 -07:00
Jordan Justen
f9ec4ac5a1 intel/ir: Lower fpow on Gen12.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
cb6db5bfb3 intel/fs/gen12: Don't support source mods for 32x16 integer multiply.
Due to hardware bug filed as HSDES#1604601757.

v2: Only return if result of fs_inst::can_do_source_mods() is known to
    be false for the case new orthogonal restrictions are implemented
    below in the future. (Caio)

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-11 12:24:16 -07:00
Francisco Jerez
de5d106ccf intel/disasm: Disassemble register file of split SEND sources.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
c03869323b intel/disasm: Don't disassemble saturate control on SEND instructions.
The field is gone on Gen12+ and it was illegal on previous
generations.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
f15e0b3439 intel/disasm/gen12: Disassemble Gen12 SEND instructions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
fd7e21dd90 intel/disasm/gen12: Disassemble Gen12 SYNC instruction.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
606d823b42 intel/disasm/gen12: Disassemble three-source instruction source and destination regions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00