Commit graph

325 commits

Author SHA1 Message Date
Ian Romanick
c856403868 i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for imageAtomicAdd of +1 or -1
v2: Refactor selection of atomic opcode to a separate function.
Suggested by Jason.

No changes on any other Intel platforms.

Skylake
total instructions in shared programs: 14304261 -> 14304241 (<.01%)
instructions in affected programs: 1625 -> 1605 (-1.23%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5
helped stats (rel) min: 1.01% max: 14.29% x̄: 5.86% x̃: 4.07%
95% mean confidence interval for instructions value: -10.66 0.66
95% mean confidence interval for instructions %-change: -15.91% 4.19%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 527531226 -> 527531194 (<.01%)
cycles in affected programs: 92204 -> 92172 (-0.03%)
helped: 2
HURT: 0

Haswell and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs: 14615730 -> 14615710 (<.01%)
instructions in affected programs: 1838 -> 1818 (-1.09%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5
helped stats (rel) min: 0.89% max: 13.04% x̄: 5.37% x̃: 3.78%
95% mean confidence interval for instructions value: -10.66 0.66
95% mean confidence interval for instructions %-change: -14.59% 3.85%
Inconclusive result (value mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-28 15:35:46 -07:00
Ian Romanick
b6e247cf0e i965/fs: Refactor image atomics to be a bit more like other atomics
This greatly simplifies the next patch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-28 15:35:46 -07:00
Ian Romanick
fabe3ead57 i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1
Funny story... a single shader was hurt for instructions, spills, fills.
That same shader was also the most helped for cycles.  #GPUsAreWeird

No changes on any other Intel platform.

v2: Refactor selection of atomic opcode to a separate function.
Suggested by Jason.

Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14304116 -> 14304261 (<.01%)
instructions in affected programs: 12776 -> 12921 (1.13%)
helped: 19
HURT: 1
helped stats (abs) min: 1 max: 16 x̄: 2.32 x̃: 1
helped stats (rel) min: 0.05% max: 7.27% x̄: 0.92% x̃: 0.55%
HURT stats (abs)   min: 189 max: 189 x̄: 189.00 x̃: 189
HURT stats (rel)   min: 4.87% max: 4.87% x̄: 4.87% x̃: 4.87%
95% mean confidence interval for instructions value: -12.83 27.33
95% mean confidence interval for instructions %-change: -1.57% 0.31%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 527552861 -> 527531226 (<.01%)
cycles in affected programs: 1459195 -> 1437560 (-1.48%)
helped: 16
HURT: 2
helped stats (abs) min: 2 max: 21328 x̄: 1353.69 x̃: 6
helped stats (rel) min: 0.01% max: 5.29% x̄: 0.36% x̃: 0.03%
HURT stats (abs)   min: 12 max: 12 x̄: 12.00 x̃: 12
HURT stats (rel)   min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -3699.81 1295.92
95% mean confidence interval for cycles %-change: -0.94% 0.30%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 8025 -> 8033 (0.10%)
spills in affected programs: 208 -> 216 (3.85%)
helped: 1
HURT: 1

total fills in shared programs: 10989 -> 11040 (0.46%)
fills in affected programs: 444 -> 495 (11.49%)
helped: 1
HURT: 1

Ivy Bridge
total instructions in shared programs: 11709181 -> 11709153 (<.01%)
instructions in affected programs: 3505 -> 3477 (-0.80%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 23 x̄: 9.33 x̃: 4
helped stats (rel) min: 0.11% max: 1.16% x̄: 0.63% x̃: 0.61%

total cycles in shared programs: 254741126 -> 254738801 (<.01%)
cycles in affected programs: 919067 -> 916742 (-0.25%)
helped: 3
HURT: 0
helped stats (abs) min: 21 max: 2144 x̄: 775.00 x̃: 160
helped stats (rel) min: 0.03% max: 0.90% x̄: 0.32% x̃: 0.03%

total spills in shared programs: 4536 -> 4533 (-0.07%)
spills in affected programs: 40 -> 37 (-7.50%)
helped: 1
HURT: 0

total fills in shared programs: 4819 -> 4813 (-0.12%)
fills in affected programs: 94 -> 88 (-6.38%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-28 15:35:38 -07:00
Kevin Rogovin
03ecec9ed2 i965: Add INTEL_fragment_shader_ordering support.
Adds suppport for INTEL_fragment_shader_ordering. We achieve
the fragment ordering by using the same instruction as for
beginInvocationInterlockARB() which is by issuing a memory
fence via sendc.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2018-08-28 17:15:10 +03:00
Ian Romanick
d515c75463 intel/compiler: Implement untyped atomic float min, max, and compare-swap dataport messages
v2: Split changes to the message type field to another patch.  Suggested
by Caio.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-22 20:31:32 -07:00
Ian Romanick
d628642a34 intel/compiler: Silence unused parameter warnings
src/intel/compiler/brw_disasm_info.c: In function ‘nir_print_instr’:
src/intel/compiler/brw_disasm_info.c:30:61: warning: unused parameter ‘instr’ [-Wunused-parameter]
 __attribute__((weak)) void nir_print_instr(const nir_instr *instr, FILE *fp) {}
                                                             ^~~~~
src/intel/compiler/brw_disasm_info.c:30:74: warning: unused parameter ‘fp’ [-Wunused-parameter]
 __attribute__((weak)) void nir_print_instr(const nir_instr *instr, FILE *fp) {}
                                                                          ^~
src/intel/compiler/brw_disasm.c: In function ‘src_ia1’:
src/intel/compiler/brw_disasm.c:850:18: warning: unused parameter ‘_reg_file’ [-Wunused-parameter]
         unsigned _reg_file,
                  ^~~~~~~~~
src/intel/compiler/brw_fs_surface_builder.cpp: In function ‘void brw::surface_access::emit_byte_scattered_write(const brw::fs_builder&, const fs_reg&, const fs_reg&, const fs_reg&, unsigned int, unsigned int, unsigned int, brw_predicate)’:
src/intel/compiler/brw_fs_surface_builder.cpp:193:57: warning: unused parameter ‘size’ [-Wunused-parameter]
                                 unsigned dims, unsigned size,
                                                         ^~~~

v2: Update commit message.  brw_fs_generator.cpp warnings were already
fixed by another patch.  Noticed by Caio.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-22 20:31:32 -07:00
Iago Toral Quiroga
471bce5689 intel/compiler: implement 8-bit constant load
Fixes VK-GL-CTS CL#2567

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-01 08:08:15 +02:00
Iago Toral Quiroga
7e6c8b0cb7 intel/compiler: add setup_imm_(u)b helpers
The hardware doesn't support byte immediates, so similar to setup_imm_df()
for doubles, these helpers work by loading the constant value into a
VGRF.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-01 08:08:15 +02:00
Kenneth Graunke
8794fe3e30 intel/compiler: Delete dead VS intrinsic handling.
These are lowered by brw_nir_lower_vs_inputs().  If they weren't, we
would have already hit the unreachable() in emit_system_values_block().

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-26 11:45:34 -07:00
Karol Herbst
7f95564a22 nir: rename f2f16_undef to f2f16
we need rounding modes on other conversions involving floats and it is easier
to rename f2f16_undef than renaming all the other ones.

v2: rebased on master

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
2018-07-24 20:40:05 +02:00
Iago Toral Quiroga
213491600a intel/compiler: emit actual barriers for working-group level barriers
Until now we have assumed that we could skip emitting these barriers
in the general case based on empirical testing and a few assumptions
detailed in a comment in the driver code, however, recent CTS tests
have showed that we actually need them to produce correct behavior.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 07:46:34 +02:00
Jose Maria Casanova Crespo
cd0afab99b i965/fs: Enable store_ssbo for 8-bit types.
v2: Update comment according to this patch. (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 00:14:50 +02:00
Jose Maria Casanova Crespo
87fc9af3fc i965/fs: Enable conversions to 8-bit integers
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 00:14:49 +02:00
Jose Maria Casanova Crespo
030472c1f0 i965: Support for 8-bit base types in helper functions
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 00:14:49 +02:00
Ian Romanick
88bd37c010 i965/fs: Properly handle sign(-abs(x))
Fixes new piglit tests:

 - glsl-1.10/execution/fs-sign-neg-abs.shader_test
 - glsl-1.10/execution/fs-sign-sat-neg-abs.shader_test
 - glsl-1.10/execution/vs-sign-neg-abs.shader_test
 - glsl-1.10/execution/vs-sign-sat-neg-abs.shader_test

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-06 16:20:04 -07:00
Neil Roberts
2d5ddbe960 i965: Fix output register sizes when variable ranges are interleaved
In 6f5abf3146 this code was fixed to calculate the maximum size of
an attribute in a seperate pass and then allocate the registers to
that size. However this wasn’t taking into account ranges that overlap
but don’t have the same starting location. For example:

layout(location = 0, component = 0) out float a[4];
layout(location = 2, component = 1) out float b[4];

Previously, if ‘a’ was processed first then it would allocate a
register of size 4 for location 0 and it wouldn’t allocate another
register for location 2 because it would already be covered by the
range of 0. Then if something tries to write to b[2] it would try to
write past the end of the register allocated for ‘a’ and it would hit
an assert.

This patch changes it to scan for any overlapping ranges that start
within each range to calculate the maximum extent and allocate that
instead.

Fixed Piglit’s arb_enhanced_layouts/execution/component-layout/
vs-fs-array-interleave-range.shader_test

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: 6f5abf3146 "i965: Fix output register sizes when multiple variables
       share a slot."
2018-07-04 10:57:51 +02:00
Francisco Jerez
bcbc7d3a17 intel/fs: Fix nir_intrinsic_load_helper_invocation for SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
73d60455e9 intel/fs: Rework INTERPOLATE_AT_PER_SLOT_OFFSET
This reworks INTERPOLATE_AT_PER_SLOT_OFFSET to work more like an ALU
operation and less like a send.  This is less code over-all and, as a
side-effect, it now properly handles execution groups and lowering so
SIMD32 support just falls out.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
1650442026 intel/fs: Disable SIMD32 dispatch for fragment shaders with discard.
Current discard handling requires dedicating the second flag register to
discard.  However, control-flow in SIMD32 requires both flag registers
so it's incompatible with the current discard handling.  Just don't
support SIMD32+discard for now.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
1811cbdc25 intel/fs: Disable SIMD32 dispatch on Gen4-6 with control flow
The hardware's control flow logic is 16-wide so we're out of luck
here.  We could, in theory, support SIMD32 if we know the control-flow
is uniform but we don't have that information at this point.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
71cd9ebed9 intel/fs: Use image_deref intrinsics instead of image_var
Since we had to rewrite the deref walking loop anyway, I took the
opportunity to make it a bit clearer and more efficient.  In particular,
in the AoA case, we will now emit one minmax instead of one per array
level.

Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-22 20:54:00 -07:00
Jose Maria Casanova Crespo
b8e099e7d5 intel/fs: shuffle_64bit_data_for_32bit_write is not used anymore
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
a4965842d6 intel/fs: Use new shuffle_32bit_write for all 64-bit storage writes
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
a4d445b93c intel/fs: shuffle_32bit_load_result_to_64bit_data is not used anymore
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
71b319a285 intel/fs: Use shuffle_from_32bit_read for 64-bit FS load_input
As the previous use of shuffle_32bit_load_result_to_64bit_data
had a source/destination overlap for 64-bit. Now a temporary destination
is used for 64-bit cases to use shuffle_from_32bit_read that doesn't
handle src/dst overlaps.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
8003ae87f4 intel/fs: shuffle_from_32bit_read at load_per_vertex_input at TCS/TES
Previously, the shuffle function had a source/destination overlap that
needs to be avoided to use shuffle_from_32bit_read. As we can use for
the shuffle destination the destination of removed MOVs.

This change also avoids the internal MOVs done by the previous shuffle
to deal with possible overlaps.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
5565630f85 intel/fs: Use shuffle_from_32bit_read at VS load_input
shuffle_from_32bit_read manages 32-bit reads to 32-bit destination
in the same way that the previous loop so now we just call the new
function for all bitsizes, simplifying also the 64-bit load_input.

v2: Add comment about future 16-bit support (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
152bffb69b intel/fs: Use shuffle_from_32bit_read for 64-bit gs_input_load
This implementation avoids two unneeded MOVs for each 64-bit
component. One was done in the old shuffle, to avoid cases of
src/dst overlap but this is not the case. And the removed MOV
was already being being done in the shuffle.

Copy propagation wasn't able to remove them because shuffle
destination values are defined with partial writes because they
have stride == 2.

v2: Reword commit log summary (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
8b26a2d96d intel/fs: shuffle_from_32bit_read for 64-bit do_untyped_vector_read
do_untyped_vector_read is used at load_ssbo and load_shared.

The previous MOVs are removed because shuffle_from_32bit_read
can handle storing the shuffle results in the expected destination
just using the proper offset.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
c2297bdf19 intel/fs: Remove old 16-bit shuffle/unshuffle functions
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
fd3d8a8f79 intel/fs: Use shuffle_for_32bit_write for 16-bits store_ssbo
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
20e4732f7d intel/fs: Use shuffle_from_32bit_read to read 16-bit SSBO
Using shuffle_from_32bit_read instead of 16-bit shuffle functions
avoids the need of retype. At the same time new function are
ready for 8-bit type SSBO reads.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
22c654941b intel/fs: New shuffle_for_32bit_write and shuffle_from_32bit_read
These new shuffle functions deal with the shuffle/unshuffle operations
needed for read/write operations using 32-bit components when the
read/written components have a different bit-size (8, 16, 64-bits).
Shuffle from 32-bit to 32-bit becomes a simple MOV.

shuffle_src_to_dst takes care of doing a shuffle when source type is
smaller than destination type and an unshuffle when source type is
bigger than destination. So this new read/write functions just need
to call shuffle_src_to_dst assuming that writes use a 32-bit
destination and reads use a 32-bit source.

As shuffle_for_32bit_write/from_32bit_read components take components
in unit of source/destination types and shuffle_src_to_dst takes units
of the smallest type component, we adjust components and first_component
parameters.

To enable this new functions it is needed than there is no
source/destination overlap in the case of shuffle_from_32bit_read.
That never happens on shuffle_for_32bit_write as it allocates a new
destination register as it was at shuffle_64bit_data_for_32bit_write.

v2: Reword commit log and add comments to explain why first_component
    and components parameters are adjusted. (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
a5665056e5 intel/fs: general 8/16/32/64-bit shuffle_src_to_dst function
This new function takes care of shuffle/unshuffle components of a
particular bit-size in components with a different bit-size.

If source type size is smaller than destination type size the operation
needed is a component shuffle. The opposite case would be an unshuffle.

Component units are measured in terms of the smaller type between
source and destination. As we are un/shuffling the smaller components
from/into a bigger one.

The operation allows to skip first_component number of components from
the source.

Shuffle MOVs are retyped using integer types avoiding problems with
denorms and float types if source and destination bitsize is different.
This allows to simplify uses of shuffle functions that are dealing with
these retypes individually.

Now there is a new restriction so source and destination can not overlap
anymore when calling this shuffle function. Following patches that migrate
to use this new function will take care individually of avoiding source
and destination overlaps.

v2: (Jason Ekstrand)
    - Rewrite overlap asserts.
    - Manage type_sz(src.type) == type_sz(dst.type) case using MOVs
      from source to dest. This works for 64-bit to 64-bits
      operation that on Gen7 as it doesn't support Q registers.
    - Explain that components units are based in the smallest type.
v3: - Fix unshuffle overlap assert (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Plamena Manolova
939312702e i965: Add ARB_fragment_shader_interlock support.
Adds suppport for ARB_fragment_shader_interlock. We achieve
the interlock and fragment ordering by issuing a memory fence
via sendc.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2018-06-01 16:36:39 +01:00
Francisco Jerez
d3cd6b7215 intel/fs: Replace the CINTERP opcode with a simple MOV
The only reason it was it's own opcode was so that we could detect it
and adjust the source register based on the payload setup.  Now that
we're using the ATTR file for FS inputs, there's no point in having a
magic opcode for this.

v2 (Jason Ekstrand):
 - Break the bit which removes the CINTERP opcode into its own patch

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-29 15:44:50 -07:00
Francisco Jerez
39de901a96 intel/fs: Use the ATTR file for FS inputs
This replaces the special magic opcodes which implicitly read inputs
with explicit use of the ATTR file.

v2 (Jason Ekstrand):
 - Break into multiple patches
 - Change the units of the FS ATTR to be in logical scalars

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-29 15:44:50 -07:00
Francisco Jerez
4bfa2ac2ea intel/fs: Rename a local variable so it doesn't shadow component()
v2 (Jason Ekstrand):
 - Break the refactor into its own patch

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-29 15:44:50 -07:00
Iago Toral Quiroga
5a12bdac09 i965/compiler: handle conversion to smaller type in the lowering pass for that
This rollbacks the revert of this same patch introduced in
commit 7b9c15628a.

And also squahes the following patch to prevent a piglit regression caused
by this change:

intel/compiler: Fix lower_conversions for 8-bit types.
Author: Jose Maria Casanova Crespo <jmcasanova@igalia.com>

For 8-bit types the execution type is word. A byte raw MOV has 16-bit
execution type and 8-bit destination and it shouldn't be considered
a conversion case. So there is no need to change alignment and enter
in lower_conversions for these instructions.

Fixes a regresion in the piglit test "glsl-fs-shader-stencil-export"
that is introduced with this patch from the Vulkan shaderInt16 series:
'i965/compiler: handle conversion to smaller type in the lowering
pass for that'. The problem is caused because there is already a case
in the driver that injects Byte instructions like this:

mov(8)          g127<1>UB       g2<32,8,4>UB

And the aforementioned pass was not accounting for the special
handling of the execution size of Byte instructions. This patch
fixes this.

v2: (Jason Ekstrand)
   - Simplify is_byte_raw_mov, include reference to PRM and not
   consider B <-> UB conversions as raw movs.

v3: (Matt Turner)
   - Indentation style fixes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106393
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-05 12:41:02 +02:00
Iago Toral Quiroga
a75f967388 intel/compiler: handle 16-bit to 64-bit conversions in BSW platforms
These are subject to the general restriction that anything that is converted
to 64-bit needs to be aligned to 64-bit.  We had this already in place for
32-bit to 64-bit conversions, so this patch generalizes the implementation
to take effect on any conversion to 64-bit from a source smaller than
64-bit.

Fixes assembly validation errors in the following CTS tests in BSW:
dEQP-VK.spirv_assembly.instruction.compute.sconvert.int16_to_int64
dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint16_to_uint64
dEQP-VK.spirv_assembly.instruction.compute.sconvert.int16_to_uint64

Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-05 12:26:37 +02:00
Mark Janes
7b9c15628a Revert "i965/compiler: handle conversion to smaller type in the lowering pass for that"
This reverts commit 96b5153790.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106393
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-05-03 15:26:59 -07:00
Iago Toral Quiroga
dd41630d9a intel/compiler: implement 16-bit pack/unpack opcodes
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:26 +02:00
Iago Toral Quiroga
6318808a05 intel/compiler: fix 16-bit comparisons
NIR assumes that booleans are always 32-bit, but Intel hardware produces
16-bit booleans for 16-bit comparisons. This means that we need to convert
the 16-bit result to 32-bit.

In the future we want to add an optimization pass to clean this up and
hopefully remove the conversions.

v2 (Jason): use the type of the source for the temporary and use
            brw_reg_type_from_bit_size for the conversion to 32-bit.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Jose Maria Casanova Crespo
e5fc3c0717 intel/compiler: implement nir_instr_type_load_const for 16-bit constants
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Iago Toral Quiroga
939501c8ed intel/compiler: implement conversions from 16-bit int/float to bool
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Iago Toral Quiroga
d5a419176f intel/compiler: implement conversion between float/int 16-bit types
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Iago Toral Quiroga
96b5153790 i965/compiler: handle conversion to smaller type in the lowering pass for that
The lowering pass was specialized to act on 64-bit to 32-bit conversions only,
but the implementation is valid for other cases.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Iago Toral Quiroga
5361a87ee7 intel/compiler: fix isign for 16-bit integers
We need to use 16-bit constants with 16-bit instructions,
otherwise we get the following validation error:

"Destination stride must be equal to the ratio of the sizes of
 the execution data type to the destination type"

Because the execution data type is 4B due to the 32-bit integer
constant.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Antia Puentes
3a1df14a7b intel: activate the gl_BaseVertex lowering
Surplus code related to the basevertex is removed.

The Vertex Elements contain now:
* VE 1: <firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <DrawID, is_indexed_draw, 0, 0>

Also fixes unreachable message.

Fixes OpenGL CTS tests:
* KHR-GL46.shader_draw_parameters_tests.ShaderDrawArraysInstancedParameters
* KHR-GL46.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters
* KHR-GL46.shader_draw_parameters_tests.MultiDrawArraysIndirectCountParameters
* KHR-GL46.shader_draw_parameters_tests.ShaderDrawArraysParameters
* KHR-GL46.shader_draw_parameters_tests.ShaderMultiDrawArraysIndirectParameters

Fixes Piglit tests:
* arb_shader_draw_parameters-drawid-indirect baseinstance
* arb_shader_draw_parameters-basevertex

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102678
2018-05-02 11:24:46 +02:00
Antia Puentes
0cbf29fa55 intel: emit is_indexed_draw in the same VE than gl_DrawID
The Vertex Elements are now:
* VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <DrawID, is-indexed-draw, 0, 0>

VE1 is it kept as it was before, VE2 additionally contains the new
system value.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-02 11:23:34 +02:00