Commit graph

588 commits

Author SHA1 Message Date
Ian Romanick
fabe3ead57 i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1
Funny story... a single shader was hurt for instructions, spills, fills.
That same shader was also the most helped for cycles.  #GPUsAreWeird

No changes on any other Intel platform.

v2: Refactor selection of atomic opcode to a separate function.
Suggested by Jason.

Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14304116 -> 14304261 (<.01%)
instructions in affected programs: 12776 -> 12921 (1.13%)
helped: 19
HURT: 1
helped stats (abs) min: 1 max: 16 x̄: 2.32 x̃: 1
helped stats (rel) min: 0.05% max: 7.27% x̄: 0.92% x̃: 0.55%
HURT stats (abs)   min: 189 max: 189 x̄: 189.00 x̃: 189
HURT stats (rel)   min: 4.87% max: 4.87% x̄: 4.87% x̃: 4.87%
95% mean confidence interval for instructions value: -12.83 27.33
95% mean confidence interval for instructions %-change: -1.57% 0.31%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 527552861 -> 527531226 (<.01%)
cycles in affected programs: 1459195 -> 1437560 (-1.48%)
helped: 16
HURT: 2
helped stats (abs) min: 2 max: 21328 x̄: 1353.69 x̃: 6
helped stats (rel) min: 0.01% max: 5.29% x̄: 0.36% x̃: 0.03%
HURT stats (abs)   min: 12 max: 12 x̄: 12.00 x̃: 12
HURT stats (rel)   min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -3699.81 1295.92
95% mean confidence interval for cycles %-change: -0.94% 0.30%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 8025 -> 8033 (0.10%)
spills in affected programs: 208 -> 216 (3.85%)
helped: 1
HURT: 1

total fills in shared programs: 10989 -> 11040 (0.46%)
fills in affected programs: 444 -> 495 (11.49%)
helped: 1
HURT: 1

Ivy Bridge
total instructions in shared programs: 11709181 -> 11709153 (<.01%)
instructions in affected programs: 3505 -> 3477 (-0.80%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 23 x̄: 9.33 x̃: 4
helped stats (rel) min: 0.11% max: 1.16% x̄: 0.63% x̃: 0.61%

total cycles in shared programs: 254741126 -> 254738801 (<.01%)
cycles in affected programs: 919067 -> 916742 (-0.25%)
helped: 3
HURT: 0
helped stats (abs) min: 21 max: 2144 x̄: 775.00 x̃: 160
helped stats (rel) min: 0.03% max: 0.90% x̄: 0.32% x̃: 0.03%

total spills in shared programs: 4536 -> 4533 (-0.07%)
spills in affected programs: 40 -> 37 (-7.50%)
helped: 1
HURT: 0

total fills in shared programs: 4819 -> 4813 (-0.12%)
fills in affected programs: 94 -> 88 (-6.38%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-28 15:35:38 -07:00
Ian Romanick
41399f4bc7 intel/compiler: Silence unused parameter warnings in brw_eu.h
All of the other brw_*_desc functions take a devinfo parameter, and all
of the others at least have an assert that uses it.  Keep the parameter,
but mark it as unused.

Silences 37 warnings like:

In file included from src/intel/common/gen_disasm.c:27:0:
src/intel/compiler/brw_eu.h: In function ‘brw_pixel_interp_desc’:
src/intel/compiler/brw_eu.h:377:53: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_pixel_interp_desc(const struct gen_device_info *devinfo,
                                                     ^~~~~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-28 15:35:38 -07:00
Kevin Rogovin
03ecec9ed2 i965: Add INTEL_fragment_shader_ordering support.
Adds suppport for INTEL_fragment_shader_ordering. We achieve
the fragment ordering by using the same instruction as for
beginInvocationInterlockARB() which is by issuing a memory
fence via sendc.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2018-08-28 17:15:10 +03:00
Sagar Ghuge
a1e3305f75 intel/eu: print bytes instead of 32 bit hex value
INTEL_DEBUG=hex prints 32 bit hex value and due to endianness of CPU
byte order is reversed. In order to disassemble binary files, print
each byte instead of 32 bit hex value.

v2: Print blank spaces in order to vertically align output of compacted
    instructions hex value with uncompacted instructions hex value.
    (Matt Turner)

v3: Fix line wrap at correct length

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-08-27 11:07:39 -07:00
Jason Ekstrand
8d8222461f intel/nir: Enable nir_opt_find_array_copies
We have to be a bit careful with this one because we want it to run in
the optimization loop but only in the first brw_nir_optimize call.
Later calls assume that we've lowered away copy_deref instructions and
we don't want to introduce any more.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15176942 -> 15176942 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

In spite of the lack of any shader-db improvement, this patch completely
eliminates spilling in the Batman: Arkham City tessellation shaders.
This is because we are now able to detect that the temporary array
created by DXVK for storing TCS inputs is a copy of the input arrays and
use indirect URB reads instead of making a copy of 4.5 KiB of input data
and then indirecting on it with if-ladders.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-23 21:47:51 -05:00
Jason Ekstrand
a4a9c07549 intel/nir: Use nir_shrink_vec_array_vars
Shader-db results on Kaby Lake:

    total instructions in shared programs: 15177605 -> 15176765 (<.01%)
    instructions in affected programs: 4259 -> 3419 (-19.72%)
    helped: 1
    HURT: 0

    total spills in shared programs: 10954 -> 10855 (-0.90%)
    spills in affected programs: 295 -> 196 (-33.56%)
    helped: 1
    HURT: 0

    total fills in shared programs: 22222 -> 22117 (-0.47%)
    fills in affected programs: 417 -> 312 (-25.18%)
    helped: 1
    HURT: 0

The helped shader is from the OglCSDof synmark test.  On my Kaby Lake
laptop, the actual framerate of the benchmark didn't appear to improve
beyond the noise.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-23 21:46:56 -05:00
Jason Ekstrand
02a5442dd7 intel/nir: Use the new structure and array splitting passes
We call structure splitting once because it is guaranteed to split all
the structures in the entire shader in one go.  We call array splitting
in the loop in case future optimizations turn indirects into direct
dereferences and we can split more arrays.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15177605 -> 15177605 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

This is unsurprising because nir_lower_vars_to_ssa already effectively
does structure and array splitting internally.  It doesn't actually
split the variables but it's ability to reason about aliasing in the
presence of arrays and structures and pick out scalars or vectors to be
lowered to SSA values is fairly advanced.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-23 21:44:14 -05:00
Ian Romanick
d515c75463 intel/compiler: Implement untyped atomic float min, max, and compare-swap dataport messages
v2: Split changes to the message type field to another patch.  Suggested
by Caio.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-22 20:31:32 -07:00
Ian Romanick
f347348f8a intel/compiler: Expand untyped atomic message type field by a bit
This is necessary for a new Gen9 message type that will be added in the
next patch.  There are also Gen8 message types that need the extra bit
(mostly for bindless).

v2: Split off from the next patch.  Suggested by Caio.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-22 20:31:32 -07:00
Ian Romanick
d628642a34 intel/compiler: Silence unused parameter warnings
src/intel/compiler/brw_disasm_info.c: In function ‘nir_print_instr’:
src/intel/compiler/brw_disasm_info.c:30:61: warning: unused parameter ‘instr’ [-Wunused-parameter]
 __attribute__((weak)) void nir_print_instr(const nir_instr *instr, FILE *fp) {}
                                                             ^~~~~
src/intel/compiler/brw_disasm_info.c:30:74: warning: unused parameter ‘fp’ [-Wunused-parameter]
 __attribute__((weak)) void nir_print_instr(const nir_instr *instr, FILE *fp) {}
                                                                          ^~
src/intel/compiler/brw_disasm.c: In function ‘src_ia1’:
src/intel/compiler/brw_disasm.c:850:18: warning: unused parameter ‘_reg_file’ [-Wunused-parameter]
         unsigned _reg_file,
                  ^~~~~~~~~
src/intel/compiler/brw_fs_surface_builder.cpp: In function ‘void brw::surface_access::emit_byte_scattered_write(const brw::fs_builder&, const fs_reg&, const fs_reg&, const fs_reg&, unsigned int, unsigned int, unsigned int, brw_predicate)’:
src/intel/compiler/brw_fs_surface_builder.cpp:193:57: warning: unused parameter ‘size’ [-Wunused-parameter]
                                 unsigned dims, unsigned size,
                                                         ^~~~

v2: Update commit message.  brw_fs_generator.cpp warnings were already
fixed by another patch.  Noticed by Caio.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-22 20:31:32 -07:00
Jason Ekstrand
10f44da775 Revert "intel/nir: Call nir_lower_io_to_scalar_early"
Commit 4434591bf5 caused substantially more URB messages in
geometry and tessellation shaders.  Before we can really enable this
sort of optimization,  We either need some way of combining them back
together into vectors or we need to do cross-stage vector element
elimination without splitting everything into scalars.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510
Fixes: 4434591bf5 "intel/nir: Call nir_lower_io_to_scalar_early"
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2018-08-15 17:56:50 -05:00
Mathieu Bridon
2ee1c86d71 meson: Build with Python 3
Now that all the build scripts are compatible with both Python 2 and 3,
we can flip the switch and tell Meson to use the latter.

Since Meson already depends on Python 3 anyway, this means we don't need
two different Python stacks to build Mesa.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-08-10 15:15:09 -07:00
Kenneth Graunke
08a5c395ab intel: Fix SIMD16 unaligned payload GRF reads on Gen4-5.
When the SIMD16 Gen4-5 fragment shader payload contains source depth
(g2-3), destination stencil (g4), and destination depth (g5-6), the
single register of stencil makes the destination depth unaligned.

We were generating this instruction in the RT write payload setup:

   mov(16)   m14<1>F   g5<8,8,1>F   { align1 compr };

which is illegal, instructions with a source region spanning more than
one register need to be aligned to even registers.  This is because the
hardware implicitly does (nr | 1) instead of (nr + 1) when splitting the
compressed instruction into two mov(8)'s.

I believe this would cause the hardware to load g5 twice, replicating
subspan 0-1's destination depth to subspan 2-3.  This showed up as 2x2
artifact blocks in both TIS-100 and Reicast.

Normally, we rely on the register allocator to even-align our virtual
GRFs.  But we don't control the payload, so we need to lower SIMD widths
to make it work.  To fix this, we teach lower_simd_width about the
restriction, and then call it again after lower_load_payload (which is
what generates the offending MOV).

Fixes: 8aee87fe4c (i965: Use SIMD16 instead of SIMD8 on Gen4 when possible.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107212
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=13728
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Diego Viola <diego.viola@gmail.com>
2018-08-09 12:33:41 -07:00
Jordan Justen
8fcdb71d8c
intel/compiler: Add brw_get_compiler_config_value for disk cache
During code review, Jason pointed out that:

2b3064c073 "i965, anv: Use INTEL_DEBUG for disk_cache driver flags"

Didn't account for INTEL_SCALER_* environment variables.

To fix this, let the compiler return the disk_cache driver flags.

Another possible fix would be to pull the INTEL_SCALER_* into
INTEL_DEBUG bits, but as we are currently using 41 of 64 bits, I
didn't think it was a good use of 4 more of these bits. (5 since
INTEL_PRECISE_TRIG needs to be accounted for as well.)

Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-01 23:49:16 -07:00
Jason Ekstrand
b2e0b0dad6 anv/pipeline: More aggressively optimize away color attachments
Instead of just looking at the number of color attachments, look at
which ones are actually used by the subpass.  This lets us potentially
throw away chunks of the fragment shader.  In DXVK, for example, all
subpasses have 8 attachments and most are VK_ATTACHMENT_UNUSED so this
is very helpful in that case.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-01 18:02:28 -07:00
Jason Ekstrand
4434591bf5 intel/nir: Call nir_lower_io_to_scalar_early
Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166953 -> 15073611 (-0.62%)
    instructions in affected programs: 2390284 -> 2296942 (-3.91%)
    helped: 16469
    HURT: 505

    total loops in shared programs: 4954 -> 4951 (-0.06%)
    loops in affected programs: 3 -> 0
    helped: 3
    HURT: 0

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-01 18:02:28 -07:00
Jason Ekstrand
b0bb547f78 intel/nir: Split IO arrays into elements
The NIR nir_lower_io_arrays_to_elements pass attempts to split I/O
variables which are arrays or matrices into a sequence of separate
variables.  This can help link-time optimization by allowing us to
remove varyings at a more granular level.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15177645 -> 15168494 (-0.06%)
    instructions in affected programs: 79857 -> 70706 (-11.46%)
    helped: 392
    HURT: 0

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-01 18:02:28 -07:00
Jason Ekstrand
57804efa88 i965/fs: Flag all slots of a flat input as flat
Otherwise, only the first vec4 of a matrix or other complex type will
get marked as flat and we'll interpolate the others.  This was caught by
a dEQP test which started failing because it did a SSO vs. non-SSO
comparison.  Previously, we did the interpolation wrong consistently in
both versions.  However, with one of Tim Arceri's NIR linkingpatches, we
started splitting the matrix input into vectors at link time in the
non-SSO version and it started getting correctly interpolated which
didn't match the broken SSO version.  As of this commit, they both get
correctly interpolated.

Fixes: e61cc87c75 "i965/fs: Add a flat_inputs field to prog_data"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-01 18:02:28 -07:00
Jason Ekstrand
4e060385e9 intel/nir: Use the correct scalar stage for consumers when linking
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-01 18:02:28 -07:00
Iago Toral Quiroga
471bce5689 intel/compiler: implement 8-bit constant load
Fixes VK-GL-CTS CL#2567

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-01 08:08:15 +02:00
Iago Toral Quiroga
7e6c8b0cb7 intel/compiler: add setup_imm_(u)b helpers
The hardware doesn't support byte immediates, so similar to setup_imm_df()
for doubles, these helpers work by loading the constant value into a
VGRF.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-01 08:08:15 +02:00
Iago Toral Quiroga
615aaedb93 intel/compiler: fix lower conversions to account for predication
The pass can create a temporary result for the instruction and then
moves from it to the original destination, however, if the original
instruction was predicated, the mov has to be predicated as well.

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
2018-07-27 14:48:29 +02:00
Kenneth Graunke
488972222c i965: Combine both gl_PatchVerticesIn lowering passes.
Until now, we had separate passes for lowering gl_PatchVerticesIn to
a statically known constant (for TES inputs when linked against a TCS),
and a uniform in the other cases.  Annoyingly, one had to be run before
nir_lower_system_values, and the other afterward.  This simplified the
passes, but made life painful for the callers.

This patch combines both into a single pass.  If you give it a non-zero
static count, it uses that.  If you give it Mesa state slots, it turns
it back into a built-in uniform.  Otherwise, it does nothing.

This also moves the i965 uniform lowering out to shared code.

v2: Make token arrays const.

Reviewed-by: Eric Anholt <eric@anholt.net>
2018-07-26 21:51:36 -07:00
Kenneth Graunke
8794fe3e30 intel/compiler: Delete dead VS intrinsic handling.
These are lowered by brw_nir_lower_vs_inputs().  If they weren't, we
would have already hit the unreachable() in emit_system_values_block().

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-26 11:45:34 -07:00
Karol Herbst
7f95564a22 nir: rename f2f16_undef to f2f16
we need rounding modes on other conversions involving floats and it is easier
to rename f2f16_undef than renaming all the other ones.

v2: rebased on master

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
2018-07-24 20:40:05 +02:00
Jason Ekstrand
820d5e51b7 intel/compiler: Account for built-in uniforms in analyze_ubo_ranges
The original pass only looked for load_uniform intrinsics but there are
a number of other places that could end up loading a push constant.  One
obvious omission was images which always implicitly use a push constant.
Legacy VS clip planes also get pushed into the shader.  This fixes some
new Vulkan CTS tests that test random combinations of bindings and, in
particular, test lots of UBOs and images together.

Cc: mesa-stable@lists.freedesktop.org
Cc: Kenneth Graunke <kenneth@whitecape.org>
2018-07-23 15:28:17 -07:00
Caio Marcelo de Oliveira Filho
4a29ee1861 intel/compiler: fix -Wsign-compare warning
Explicitly convert to signed integer. Conversion is valid since is the
same (implicitly) used to initialize the loop. Avoids the warning:

../../src/intel/compiler/brw_fs.cpp: In member function ‘bool fs_visitor::lower_simd_width()’:
../../src/intel/compiler/brw_fs.cpp:5761:45: warning: comparison of integer expressions of different signedness: ‘int’ and ‘unsigned int’ [-Wsign-compare]
             split_inst.eot = inst->eot && i == n - 1;
                                           ~~^~~~~~~~

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2018-07-18 08:29:51 -07:00
Caio Marcelo de Oliveira Filho
7df5f62768 intel/compiler: silence -Wclass-memaccess warnings
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2018-07-18 08:29:51 -07:00
Jose Maria Casanova Crespo
62f37ee53d i965/fs: unspills shoudn't use grf127 as dest since Gen8+
At 232ed89802 "i965/fs: Register allocator
shoudn't use grf127 for sends dest" we didn't take into account the case
of SEND instructions that are not send_from_grf. But since Gen7+ although
the backend still uses MRFs internally for sends they are finally
assigned to a GRFs.

In the case of unspills the backend assigns directly as source its
destination because it is suppose to be available. So we always have a
source-destination overlap. If the reg_allocator assigns registers that
include the grf127 we fail the validation rule that affects Gen8+
"r127 must not be used for return address when there is a src and dest
overlap in send instruction."

So this patch activates the grf127_send_hack_node for Gen8+ and if we
have any register spilled we add interferences to the destination of
the unspill operations.

We also need to avoid that opt_bank_conflicts() optimization, that runs
after the register allocation, doesn't move things around, causing the
grf127 to be used in the condition we were avoiding.

Fixes piglit test tests/spec/arb_compute_shader/linker/bug-93840.shader_test
and some shader-db crashed because of the grf127 validation rule..

v2: make sure that opt_bank_conflicts() optimization doesn't change
the use of grf127. (Caio)

Found by Caio Marcelo de Oliveira Filho

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107193
Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127 for sends dest"
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Cc: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-12 18:02:26 +02:00
Francisco Jerez
18c086a9e6 intel/ir: Uncomment definition of several unused hardware opcodes.
There are a number of opcode_desc table entries for many of these
unused opcodes.  A symbolic opcode enum will be required in a future
commit in order to keep them in the opcode description tables.  The
alternative would be to remove the unused opcodes from the opcode
description tables.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
48d6fc5eb6 intel/fs: Initialize mlen for gen7 varying pull constant load messages.
This makes the message length available at the IR level, which should
save some guesswork in a future commit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
6643143f6e intel/eu: Assert that the instruction is send-like in brw_set_desc_ex().
Constructing a descriptor in-place as part of the immediate of an ALU
instruction is no longer supported.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
6f81e2b994 intel/eu: Get rid of the return value of brw_send_indirect_message().
The return value is not used anymore.  This allows simplifying the
code slightly, and in addition it should frustrate anybody's attempts
to continue using the obsolete piecemeal approach to construct a
message descriptor in combination with brw_send_indirect_message().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
b3cce4c130 intel/eu: Get rid of the return value of brw_send_indirect_surface_message().
All users of brw_send_indirect_surface_message() should be providing a
full descriptor immediate up front by now, this isn't necessary
anymore.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
95b5367149 intel/eu: Use descriptor constructors for dataport typed surface messages.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
94166cef40 intel/eu: Use descriptor constructors for dataport scattered byte surface messages.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
2a9605d610 intel/eu: Use descriptor constructors for dataport untyped surface messages.
v2: Use SET_BITS macro instead of left shift (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
8e707fc2af intel/eu: Provide single descriptor argument to brw_send_indirect_surface_message().
Instead of the current message_len, response_len and header_present
arguments.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
b10b4e7c45 intel/eu: Use descriptor constructors for pixel interpolator messages.
v2: Use SET_BITS macro instead of left shift (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
8fa4bc4676 intel/eu: Use descriptor constructors for dataport write messages.
v2: Use SET_BITS macro instead of left shift (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Francisco Jerez
2bac890bf5 intel/eu: Use descriptor constructors for dataport read messages.
v2: Use SET_BITS macro instead of left shift (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Francisco Jerez
27c211e30f intel/eu: Use descriptor constructors for sampler messages.
v2: Use SET_BITS macro instead of left shift (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Francisco Jerez
1c90ae5acc intel/eu: Provide desc immediate argument up front to brw_send_indirect_message().
The current approach of returning a setup instruction where additional
descriptor fields can be specified is still supported in order to keep
things working, but it will be removed later in this series.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Francisco Jerez
b382bdde1d TRIVIAL: intel/eu: Use a local devinfo variable in brw_shader_time_add().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Francisco Jerez
c3793d49e4 intel/eu: Use brw_set_desc() along with a helper to set common descriptor controls.
This replaces brw_set_message_descriptor() with the composition of
brw_set_desc() and a new inline helper function that packs the common
message descriptor controls into an integer.  The goal is to represent
all message descriptors as a 32-bit integer which is written at once
into the instruction, which is more flexible (SENDS anyone?), robust
(see d2eecf0b0b fixing an issue
ultimately caused by some bits of the extended message descriptor
being left undefined) and future-proof than the current approach of
specifying the individual descriptor fields directly into the
instruction.

This approach also seems more self-documenting, since it will allow
removing calls to functions with way too many arguments like
brw_set_*_message() and brw_send_indirect_message(), and instead
provide a single descriptor argument constructed from an appropriate
combination of brw_*_desc() helpers.

Note that because brw_set_message_descriptor() was (conditionally?)
overriding fields of the instruction which strictly speaking weren't
part of the message descriptor, this involves calling
brw_inst_set_sfid() and brw_inst_set_eot() in some cases in addition
to brw_set_desc().

v2: Use SET_BITS macro instead of left shift (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Francisco Jerez
20b962232b intel/eu: Define SET_BITS helper more easily reusable than SET_FIELD.
Allows to specify a bitfield based on its upper and lower bounds
instead of a symbolic field definition, kind of what the current
GET_BITS macro is to GET_FIELD.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Francisco Jerez
d0f589a55b intel/eu: Define helper to specify the descriptor immediates of a SEND instruction.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Francisco Jerez
f55884cad3 intel/eu: Add brw_inst.h helpers for the SEND(C) descriptor and extended descriptor.
This introduces helpers that can be used to specify or extract the
whole descriptor of a SEND message instruction at once.  Because the
the instruction encoding of these is rather awkward on some
generations using the generic brw_inst.h macros doesn't seem like an
option.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Iago Toral Quiroga
213491600a intel/compiler: emit actual barriers for working-group level barriers
Until now we have assumed that we could skip emitting these barriers
in the general case based on empirical testing and a few assumptions
detailed in a comment in the driver code, however, recent CTS tests
have showed that we actually need them to produce correct behavior.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 07:46:34 +02:00
Jose Maria Casanova Crespo
cd0afab99b i965/fs: Enable store_ssbo for 8-bit types.
v2: Update comment according to this patch. (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 00:14:50 +02:00