Commit graph

1355 commits

Author SHA1 Message Date
Alejandro Piñeiro
4e1f8d82c2 i965/fs: include multisamplers on image_intrinsic_coord_components
This is the second patch needed to fix the following piglit tests:

   tests/spec/arb_gl_spirv/linker/uniform/multisampler.shader_test
   tests/spec/arb_gl_spirv/linker/uniform/multisampler-array.shader_test

Although in this case it doesn't affect so many borrowed tests, as
there aren't too many tests using multisamplers on Intel.

It is worth to note that this patch is also needed when those tests
are run on GLSL mode (using the --glsl option). Although most Intel
drivers would not be able to run/execute tests using multisamplers, as
GL_MAX_IMAGE_SAMPLES is zero, technically those tests are expected to
link correctly, so linking tests should pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-09-05 17:02:28 +02:00
Alejandro Piñeiro
2a6182fe06 intel/compiler: rename brw_nir_lower_glsl_images
To brw_nir_lower_gl_images, as it will be also used on the
ARB_gl_spirv codepath, that doesn't involves GLSL at all. So the
lowering is about images following the OpenGL semantics. In any case
"brw_nir_lower_opengl_images" seemed too long to me, so I just used
gl. That shortening is already used on other parts of the code.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-09-05 17:02:28 +02:00
Jason Ekstrand
67571ae796 intel/compiler: Remove redundant nir_remove_dead_variables call
As of 07a2098a70, brw_nir_optimize calls nir_remove_dead_variables as
the last optimization.  Doing it again is just pointless.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-09-04 09:03:16 -05:00
Lionel Landwerlin
07a2098a70 intel: compiler: remove dead local variables at optimization pass
We're hitting an assert in gfxbench because one of the local variable
is a sampler (according to Jason this isn't valid) :

testfw_app: ../src/compiler/nir_types.cpp:551: void glsl_get_natural_size_align_bytes(const glsl_type*, unsigned int*, unsigned int*): Assertion `!"type does not have a natural size"' failed.

Since this particular variable isn't used, it can be eliminated by
removing unused local variables at the end of the optimization loop.
This makes sense also for valid local variables.

v2: Move additional local variable removal out of optimization loop,
    but before large constant removal (Jason/Lionel)

v3: Move the removal at the end of brw_nir_optimize()

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107806
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-09-03 17:24:19 +01:00
Ian Romanick
82530ce1b5 i965/vec4: Clamp indirect tes input array reads with 0x0fffffff
Page 190 of "Volume 7: 3D Media GPGPU Engine (Haswell)" says the valid
range of the offset is [0, 0FFFFFFFh].

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2018-09-01 00:23:45 -07:00
Ian Romanick
75666605c9 i965/vec4: Correctly handle uniform sources in generate_tes_add_indirect_urb_offset
Fixes failure in the new piglit test
tes-patch-input-array-vec2-index-invalid-rd.shader_test.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2018-09-01 00:23:43 -07:00
Rodrigo Vivi
e8c42ed4ab intel: Introducing Amber Lake platform
Amber Lake uses the same gen graphics as Kaby Lake, including a id
that were previously marked as reserved on Kaby Lake, but that
now is moved to AML page.

This follows the ids and approach used on kernel's commit
e364672477a1 ("drm/i915/aml: Introducing Amber Lake platform")

Reported-by: Timo Aaltonen <timo.aaltonen@canonical.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-31 13:57:52 -07:00
Jason Ekstrand
a0f18f2142 intel/nir: Lowering image loads and stores trashes all metadata
This fixes the GL_ARB_fragment_shader_interlock piglit test on gen8
platforms where the lack of metadata dirtying was causing another pass
to accidentally delete a much needed loop.

https://bugs.freedesktop.org/show_bug.cgi?id=107745
Fixes: 37f7983bcc "intel/compiler: Do image load/store lowering..."
Jason Ekstrand <jason@jlekstrand.net> writes:
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-30 14:06:31 -05:00
Jason Ekstrand
d8033d4083 intel/compiler: Remove surface_idx from brw_image_param
Now that the drivers are lowering to surface indices themselves, we no
longer need to push the surface index into the shader.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:03 -05:00
Jason Ekstrand
3cbc02e469 intel: Use TXS for image_size when we have a typed surface
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:03 -05:00
Jason Ekstrand
09f1de97a7 anv,i965: Lower away image derefs in the driver
Previously, the back-end compiler turn image access into magic uniform
reads and there was a complex contract between back-end compiler and
driver about setting up and filling out those params.  As of this
commit, both drivers now lower image_deref_load_param_intel intrinsics
to load_uniform intrinsics controlled by the driver and lower the other
image_deref_* intrinsics to image_* intrinsics which take an actual
binding table index.  There are still "magic" uniforms but they are now
added and controlled entirely by the driver and that contract no longer
spans components.

This also has the side-effect of making most image use compile-time
binding table indices.  Previously, all image access pulled the binding
table index from a uniform.  Part of the reason for this was that the
magic uniforms made it difficult to decouple binding table indices from
the uniforms and, since they are indexed completely differently
(especially in Vulkan), it was hard to pull them apart.  Now that the
driver is handling both, it's trivial to decouple the two and provide
actual binding table indices.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166872 -> 15164293 (-0.02%)
    instructions in affected programs: 115834 -> 113255 (-2.23%)
    helped: 191
    HURT: 0

    total cycles in shared programs: 571311495 -> 571196465 (-0.02%)
    cycles in affected programs: 4757115 -> 4642085 (-2.42%)
    helped: 73
    HURT: 67

    total spills in shared programs: 10951 -> 10926 (-0.23%)
    spills in affected programs: 742 -> 717 (-3.37%)
    helped: 7
    HURT: 0

    total fills in shared programs: 22226 -> 22201 (-0.11%)
    fills in affected programs: 1146 -> 1121 (-2.18%)
    helped: 7
    HURT: 0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:03 -05:00
Jason Ekstrand
3942943819 nir: Use a bitfield for image access qualifiers
This commit expands the current memory access enum to contain the extra
two bits provided for images.  We choose to follow the SPIR-V convention
of NonReadable and NonWriteable because readonly implies that you *can*
read so readonly + writeonly doesn't make as much sense as NonReadable +
NonWriteable.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:02 -05:00
Jason Ekstrand
4289143899 intel/compiler: Use two components for 1D array image sizes
Having the array length component stored in .z was a small convenience
for the ISL image param filling code and an annoyance in the NIR
lowering code.  The only convenience of treating 1D arrays like 2D
arrays in the lowering code is in the address calculation code so let's
put all the complexity there as well.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:02 -05:00
Jason Ekstrand
37f7983bcc intel/compiler: Do image load/store lowering to NIR
This commit moves our storage image format conversion codegen into NIR
instead of doing it in the back-end.  This has the advantage of letting
us run it through NIR's optimizer which is pretty effective at shrinking
things down.  In the common case of rgba8, the number of instructions
emitted after NIR is done with it is half of what it was with the
lowering happening in the back-end.  On the downside, the back-end's
lowering is able to directly use predicates and the NIR lowering has to
use IFs.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166910 -> 15166872 (<.01%)
    instructions in affected programs: 5895 -> 5857 (-0.64%)
    helped: 15
    HURT: 0

Clearly, we don't have that much image_load_store happening in the
shaders in shader-db....

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:02 -05:00
Ian Romanick
c836326a29 i965/vec4: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1
No shader-db changes on any Intel platform.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-28 15:35:50 -07:00
Ian Romanick
c856403868 i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for imageAtomicAdd of +1 or -1
v2: Refactor selection of atomic opcode to a separate function.
Suggested by Jason.

No changes on any other Intel platforms.

Skylake
total instructions in shared programs: 14304261 -> 14304241 (<.01%)
instructions in affected programs: 1625 -> 1605 (-1.23%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5
helped stats (rel) min: 1.01% max: 14.29% x̄: 5.86% x̃: 4.07%
95% mean confidence interval for instructions value: -10.66 0.66
95% mean confidence interval for instructions %-change: -15.91% 4.19%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 527531226 -> 527531194 (<.01%)
cycles in affected programs: 92204 -> 92172 (-0.03%)
helped: 2
HURT: 0

Haswell and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs: 14615730 -> 14615710 (<.01%)
instructions in affected programs: 1838 -> 1818 (-1.09%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5
helped stats (rel) min: 0.89% max: 13.04% x̄: 5.37% x̃: 3.78%
95% mean confidence interval for instructions value: -10.66 0.66
95% mean confidence interval for instructions %-change: -14.59% 3.85%
Inconclusive result (value mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-28 15:35:46 -07:00
Ian Romanick
b6e247cf0e i965/fs: Refactor image atomics to be a bit more like other atomics
This greatly simplifies the next patch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-28 15:35:46 -07:00
Ian Romanick
fabe3ead57 i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1
Funny story... a single shader was hurt for instructions, spills, fills.
That same shader was also the most helped for cycles.  #GPUsAreWeird

No changes on any other Intel platform.

v2: Refactor selection of atomic opcode to a separate function.
Suggested by Jason.

Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14304116 -> 14304261 (<.01%)
instructions in affected programs: 12776 -> 12921 (1.13%)
helped: 19
HURT: 1
helped stats (abs) min: 1 max: 16 x̄: 2.32 x̃: 1
helped stats (rel) min: 0.05% max: 7.27% x̄: 0.92% x̃: 0.55%
HURT stats (abs)   min: 189 max: 189 x̄: 189.00 x̃: 189
HURT stats (rel)   min: 4.87% max: 4.87% x̄: 4.87% x̃: 4.87%
95% mean confidence interval for instructions value: -12.83 27.33
95% mean confidence interval for instructions %-change: -1.57% 0.31%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 527552861 -> 527531226 (<.01%)
cycles in affected programs: 1459195 -> 1437560 (-1.48%)
helped: 16
HURT: 2
helped stats (abs) min: 2 max: 21328 x̄: 1353.69 x̃: 6
helped stats (rel) min: 0.01% max: 5.29% x̄: 0.36% x̃: 0.03%
HURT stats (abs)   min: 12 max: 12 x̄: 12.00 x̃: 12
HURT stats (rel)   min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -3699.81 1295.92
95% mean confidence interval for cycles %-change: -0.94% 0.30%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 8025 -> 8033 (0.10%)
spills in affected programs: 208 -> 216 (3.85%)
helped: 1
HURT: 1

total fills in shared programs: 10989 -> 11040 (0.46%)
fills in affected programs: 444 -> 495 (11.49%)
helped: 1
HURT: 1

Ivy Bridge
total instructions in shared programs: 11709181 -> 11709153 (<.01%)
instructions in affected programs: 3505 -> 3477 (-0.80%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 23 x̄: 9.33 x̃: 4
helped stats (rel) min: 0.11% max: 1.16% x̄: 0.63% x̃: 0.61%

total cycles in shared programs: 254741126 -> 254738801 (<.01%)
cycles in affected programs: 919067 -> 916742 (-0.25%)
helped: 3
HURT: 0
helped stats (abs) min: 21 max: 2144 x̄: 775.00 x̃: 160
helped stats (rel) min: 0.03% max: 0.90% x̄: 0.32% x̃: 0.03%

total spills in shared programs: 4536 -> 4533 (-0.07%)
spills in affected programs: 40 -> 37 (-7.50%)
helped: 1
HURT: 0

total fills in shared programs: 4819 -> 4813 (-0.12%)
fills in affected programs: 94 -> 88 (-6.38%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-28 15:35:38 -07:00
Ian Romanick
41399f4bc7 intel/compiler: Silence unused parameter warnings in brw_eu.h
All of the other brw_*_desc functions take a devinfo parameter, and all
of the others at least have an assert that uses it.  Keep the parameter,
but mark it as unused.

Silences 37 warnings like:

In file included from src/intel/common/gen_disasm.c:27:0:
src/intel/compiler/brw_eu.h: In function ‘brw_pixel_interp_desc’:
src/intel/compiler/brw_eu.h:377:53: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_pixel_interp_desc(const struct gen_device_info *devinfo,
                                                     ^~~~~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-28 15:35:38 -07:00
Kevin Rogovin
03ecec9ed2 i965: Add INTEL_fragment_shader_ordering support.
Adds suppport for INTEL_fragment_shader_ordering. We achieve
the fragment ordering by using the same instruction as for
beginInvocationInterlockARB() which is by issuing a memory
fence via sendc.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2018-08-28 17:15:10 +03:00
Sagar Ghuge
a1e3305f75 intel/eu: print bytes instead of 32 bit hex value
INTEL_DEBUG=hex prints 32 bit hex value and due to endianness of CPU
byte order is reversed. In order to disassemble binary files, print
each byte instead of 32 bit hex value.

v2: Print blank spaces in order to vertically align output of compacted
    instructions hex value with uncompacted instructions hex value.
    (Matt Turner)

v3: Fix line wrap at correct length

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-08-27 11:07:39 -07:00
Jason Ekstrand
8d8222461f intel/nir: Enable nir_opt_find_array_copies
We have to be a bit careful with this one because we want it to run in
the optimization loop but only in the first brw_nir_optimize call.
Later calls assume that we've lowered away copy_deref instructions and
we don't want to introduce any more.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15176942 -> 15176942 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

In spite of the lack of any shader-db improvement, this patch completely
eliminates spilling in the Batman: Arkham City tessellation shaders.
This is because we are now able to detect that the temporary array
created by DXVK for storing TCS inputs is a copy of the input arrays and
use indirect URB reads instead of making a copy of 4.5 KiB of input data
and then indirecting on it with if-ladders.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-23 21:47:51 -05:00
Jason Ekstrand
a4a9c07549 intel/nir: Use nir_shrink_vec_array_vars
Shader-db results on Kaby Lake:

    total instructions in shared programs: 15177605 -> 15176765 (<.01%)
    instructions in affected programs: 4259 -> 3419 (-19.72%)
    helped: 1
    HURT: 0

    total spills in shared programs: 10954 -> 10855 (-0.90%)
    spills in affected programs: 295 -> 196 (-33.56%)
    helped: 1
    HURT: 0

    total fills in shared programs: 22222 -> 22117 (-0.47%)
    fills in affected programs: 417 -> 312 (-25.18%)
    helped: 1
    HURT: 0

The helped shader is from the OglCSDof synmark test.  On my Kaby Lake
laptop, the actual framerate of the benchmark didn't appear to improve
beyond the noise.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-23 21:46:56 -05:00
Jason Ekstrand
02a5442dd7 intel/nir: Use the new structure and array splitting passes
We call structure splitting once because it is guaranteed to split all
the structures in the entire shader in one go.  We call array splitting
in the loop in case future optimizations turn indirects into direct
dereferences and we can split more arrays.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15177605 -> 15177605 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

This is unsurprising because nir_lower_vars_to_ssa already effectively
does structure and array splitting internally.  It doesn't actually
split the variables but it's ability to reason about aliasing in the
presence of arrays and structures and pick out scalars or vectors to be
lowered to SSA values is fairly advanced.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-23 21:44:14 -05:00
Ian Romanick
d515c75463 intel/compiler: Implement untyped atomic float min, max, and compare-swap dataport messages
v2: Split changes to the message type field to another patch.  Suggested
by Caio.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-22 20:31:32 -07:00
Ian Romanick
f347348f8a intel/compiler: Expand untyped atomic message type field by a bit
This is necessary for a new Gen9 message type that will be added in the
next patch.  There are also Gen8 message types that need the extra bit
(mostly for bindless).

v2: Split off from the next patch.  Suggested by Caio.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-22 20:31:32 -07:00
Ian Romanick
d628642a34 intel/compiler: Silence unused parameter warnings
src/intel/compiler/brw_disasm_info.c: In function ‘nir_print_instr’:
src/intel/compiler/brw_disasm_info.c:30:61: warning: unused parameter ‘instr’ [-Wunused-parameter]
 __attribute__((weak)) void nir_print_instr(const nir_instr *instr, FILE *fp) {}
                                                             ^~~~~
src/intel/compiler/brw_disasm_info.c:30:74: warning: unused parameter ‘fp’ [-Wunused-parameter]
 __attribute__((weak)) void nir_print_instr(const nir_instr *instr, FILE *fp) {}
                                                                          ^~
src/intel/compiler/brw_disasm.c: In function ‘src_ia1’:
src/intel/compiler/brw_disasm.c:850:18: warning: unused parameter ‘_reg_file’ [-Wunused-parameter]
         unsigned _reg_file,
                  ^~~~~~~~~
src/intel/compiler/brw_fs_surface_builder.cpp: In function ‘void brw::surface_access::emit_byte_scattered_write(const brw::fs_builder&, const fs_reg&, const fs_reg&, const fs_reg&, unsigned int, unsigned int, unsigned int, brw_predicate)’:
src/intel/compiler/brw_fs_surface_builder.cpp:193:57: warning: unused parameter ‘size’ [-Wunused-parameter]
                                 unsigned dims, unsigned size,
                                                         ^~~~

v2: Update commit message.  brw_fs_generator.cpp warnings were already
fixed by another patch.  Noticed by Caio.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-22 20:31:32 -07:00
Jason Ekstrand
10f44da775 Revert "intel/nir: Call nir_lower_io_to_scalar_early"
Commit 4434591bf5 caused substantially more URB messages in
geometry and tessellation shaders.  Before we can really enable this
sort of optimization,  We either need some way of combining them back
together into vectors or we need to do cross-stage vector element
elimination without splitting everything into scalars.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510
Fixes: 4434591bf5 "intel/nir: Call nir_lower_io_to_scalar_early"
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2018-08-15 17:56:50 -05:00
Mathieu Bridon
2ee1c86d71 meson: Build with Python 3
Now that all the build scripts are compatible with both Python 2 and 3,
we can flip the switch and tell Meson to use the latter.

Since Meson already depends on Python 3 anyway, this means we don't need
two different Python stacks to build Mesa.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-08-10 15:15:09 -07:00
Kenneth Graunke
08a5c395ab intel: Fix SIMD16 unaligned payload GRF reads on Gen4-5.
When the SIMD16 Gen4-5 fragment shader payload contains source depth
(g2-3), destination stencil (g4), and destination depth (g5-6), the
single register of stencil makes the destination depth unaligned.

We were generating this instruction in the RT write payload setup:

   mov(16)   m14<1>F   g5<8,8,1>F   { align1 compr };

which is illegal, instructions with a source region spanning more than
one register need to be aligned to even registers.  This is because the
hardware implicitly does (nr | 1) instead of (nr + 1) when splitting the
compressed instruction into two mov(8)'s.

I believe this would cause the hardware to load g5 twice, replicating
subspan 0-1's destination depth to subspan 2-3.  This showed up as 2x2
artifact blocks in both TIS-100 and Reicast.

Normally, we rely on the register allocator to even-align our virtual
GRFs.  But we don't control the payload, so we need to lower SIMD widths
to make it work.  To fix this, we teach lower_simd_width about the
restriction, and then call it again after lower_load_payload (which is
what generates the offending MOV).

Fixes: 8aee87fe4c (i965: Use SIMD16 instead of SIMD8 on Gen4 when possible.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107212
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=13728
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Diego Viola <diego.viola@gmail.com>
2018-08-09 12:33:41 -07:00
Jordan Justen
8fcdb71d8c
intel/compiler: Add brw_get_compiler_config_value for disk cache
During code review, Jason pointed out that:

2b3064c073 "i965, anv: Use INTEL_DEBUG for disk_cache driver flags"

Didn't account for INTEL_SCALER_* environment variables.

To fix this, let the compiler return the disk_cache driver flags.

Another possible fix would be to pull the INTEL_SCALER_* into
INTEL_DEBUG bits, but as we are currently using 41 of 64 bits, I
didn't think it was a good use of 4 more of these bits. (5 since
INTEL_PRECISE_TRIG needs to be accounted for as well.)

Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-01 23:49:16 -07:00
Jason Ekstrand
b2e0b0dad6 anv/pipeline: More aggressively optimize away color attachments
Instead of just looking at the number of color attachments, look at
which ones are actually used by the subpass.  This lets us potentially
throw away chunks of the fragment shader.  In DXVK, for example, all
subpasses have 8 attachments and most are VK_ATTACHMENT_UNUSED so this
is very helpful in that case.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-01 18:02:28 -07:00
Jason Ekstrand
4434591bf5 intel/nir: Call nir_lower_io_to_scalar_early
Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166953 -> 15073611 (-0.62%)
    instructions in affected programs: 2390284 -> 2296942 (-3.91%)
    helped: 16469
    HURT: 505

    total loops in shared programs: 4954 -> 4951 (-0.06%)
    loops in affected programs: 3 -> 0
    helped: 3
    HURT: 0

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-01 18:02:28 -07:00
Jason Ekstrand
b0bb547f78 intel/nir: Split IO arrays into elements
The NIR nir_lower_io_arrays_to_elements pass attempts to split I/O
variables which are arrays or matrices into a sequence of separate
variables.  This can help link-time optimization by allowing us to
remove varyings at a more granular level.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15177645 -> 15168494 (-0.06%)
    instructions in affected programs: 79857 -> 70706 (-11.46%)
    helped: 392
    HURT: 0

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-01 18:02:28 -07:00
Jason Ekstrand
57804efa88 i965/fs: Flag all slots of a flat input as flat
Otherwise, only the first vec4 of a matrix or other complex type will
get marked as flat and we'll interpolate the others.  This was caught by
a dEQP test which started failing because it did a SSO vs. non-SSO
comparison.  Previously, we did the interpolation wrong consistently in
both versions.  However, with one of Tim Arceri's NIR linkingpatches, we
started splitting the matrix input into vectors at link time in the
non-SSO version and it started getting correctly interpolated which
didn't match the broken SSO version.  As of this commit, they both get
correctly interpolated.

Fixes: e61cc87c75 "i965/fs: Add a flat_inputs field to prog_data"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-01 18:02:28 -07:00
Jason Ekstrand
4e060385e9 intel/nir: Use the correct scalar stage for consumers when linking
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-01 18:02:28 -07:00
Iago Toral Quiroga
471bce5689 intel/compiler: implement 8-bit constant load
Fixes VK-GL-CTS CL#2567

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-01 08:08:15 +02:00
Iago Toral Quiroga
7e6c8b0cb7 intel/compiler: add setup_imm_(u)b helpers
The hardware doesn't support byte immediates, so similar to setup_imm_df()
for doubles, these helpers work by loading the constant value into a
VGRF.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-01 08:08:15 +02:00
Iago Toral Quiroga
615aaedb93 intel/compiler: fix lower conversions to account for predication
The pass can create a temporary result for the instruction and then
moves from it to the original destination, however, if the original
instruction was predicated, the mov has to be predicated as well.

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
2018-07-27 14:48:29 +02:00
Kenneth Graunke
488972222c i965: Combine both gl_PatchVerticesIn lowering passes.
Until now, we had separate passes for lowering gl_PatchVerticesIn to
a statically known constant (for TES inputs when linked against a TCS),
and a uniform in the other cases.  Annoyingly, one had to be run before
nir_lower_system_values, and the other afterward.  This simplified the
passes, but made life painful for the callers.

This patch combines both into a single pass.  If you give it a non-zero
static count, it uses that.  If you give it Mesa state slots, it turns
it back into a built-in uniform.  Otherwise, it does nothing.

This also moves the i965 uniform lowering out to shared code.

v2: Make token arrays const.

Reviewed-by: Eric Anholt <eric@anholt.net>
2018-07-26 21:51:36 -07:00
Kenneth Graunke
8794fe3e30 intel/compiler: Delete dead VS intrinsic handling.
These are lowered by brw_nir_lower_vs_inputs().  If they weren't, we
would have already hit the unreachable() in emit_system_values_block().

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-26 11:45:34 -07:00
Karol Herbst
7f95564a22 nir: rename f2f16_undef to f2f16
we need rounding modes on other conversions involving floats and it is easier
to rename f2f16_undef than renaming all the other ones.

v2: rebased on master

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
2018-07-24 20:40:05 +02:00
Jason Ekstrand
820d5e51b7 intel/compiler: Account for built-in uniforms in analyze_ubo_ranges
The original pass only looked for load_uniform intrinsics but there are
a number of other places that could end up loading a push constant.  One
obvious omission was images which always implicitly use a push constant.
Legacy VS clip planes also get pushed into the shader.  This fixes some
new Vulkan CTS tests that test random combinations of bindings and, in
particular, test lots of UBOs and images together.

Cc: mesa-stable@lists.freedesktop.org
Cc: Kenneth Graunke <kenneth@whitecape.org>
2018-07-23 15:28:17 -07:00
Caio Marcelo de Oliveira Filho
4a29ee1861 intel/compiler: fix -Wsign-compare warning
Explicitly convert to signed integer. Conversion is valid since is the
same (implicitly) used to initialize the loop. Avoids the warning:

../../src/intel/compiler/brw_fs.cpp: In member function ‘bool fs_visitor::lower_simd_width()’:
../../src/intel/compiler/brw_fs.cpp:5761:45: warning: comparison of integer expressions of different signedness: ‘int’ and ‘unsigned int’ [-Wsign-compare]
             split_inst.eot = inst->eot && i == n - 1;
                                           ~~^~~~~~~~

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2018-07-18 08:29:51 -07:00
Caio Marcelo de Oliveira Filho
7df5f62768 intel/compiler: silence -Wclass-memaccess warnings
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2018-07-18 08:29:51 -07:00
Jose Maria Casanova Crespo
62f37ee53d i965/fs: unspills shoudn't use grf127 as dest since Gen8+
At 232ed89802 "i965/fs: Register allocator
shoudn't use grf127 for sends dest" we didn't take into account the case
of SEND instructions that are not send_from_grf. But since Gen7+ although
the backend still uses MRFs internally for sends they are finally
assigned to a GRFs.

In the case of unspills the backend assigns directly as source its
destination because it is suppose to be available. So we always have a
source-destination overlap. If the reg_allocator assigns registers that
include the grf127 we fail the validation rule that affects Gen8+
"r127 must not be used for return address when there is a src and dest
overlap in send instruction."

So this patch activates the grf127_send_hack_node for Gen8+ and if we
have any register spilled we add interferences to the destination of
the unspill operations.

We also need to avoid that opt_bank_conflicts() optimization, that runs
after the register allocation, doesn't move things around, causing the
grf127 to be used in the condition we were avoiding.

Fixes piglit test tests/spec/arb_compute_shader/linker/bug-93840.shader_test
and some shader-db crashed because of the grf127 validation rule..

v2: make sure that opt_bank_conflicts() optimization doesn't change
the use of grf127. (Caio)

Found by Caio Marcelo de Oliveira Filho

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107193
Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127 for sends dest"
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Cc: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-12 18:02:26 +02:00
Francisco Jerez
18c086a9e6 intel/ir: Uncomment definition of several unused hardware opcodes.
There are a number of opcode_desc table entries for many of these
unused opcodes.  A symbolic opcode enum will be required in a future
commit in order to keep them in the opcode description tables.  The
alternative would be to remove the unused opcodes from the opcode
description tables.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
48d6fc5eb6 intel/fs: Initialize mlen for gen7 varying pull constant load messages.
This makes the message length available at the IR level, which should
save some guesswork in a future commit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
6643143f6e intel/eu: Assert that the instruction is send-like in brw_set_desc_ex().
Constructing a descriptor in-place as part of the immediate of an ALU
instruction is no longer supported.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
6f81e2b994 intel/eu: Get rid of the return value of brw_send_indirect_message().
The return value is not used anymore.  This allows simplifying the
code slightly, and in addition it should frustrate anybody's attempts
to continue using the obsolete piecemeal approach to construct a
message descriptor in combination with brw_send_indirect_message().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00