Commit graph

491 commits

Author SHA1 Message Date
Plamena Manolova
939312702e i965: Add ARB_fragment_shader_interlock support.
Adds suppport for ARB_fragment_shader_interlock. We achieve
the interlock and fragment ordering by issuing a memory fence
via sendc.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2018-06-01 16:36:39 +01:00
Francisco Jerez
d3cd6b7215 intel/fs: Replace the CINTERP opcode with a simple MOV
The only reason it was it's own opcode was so that we could detect it
and adjust the source register based on the payload setup.  Now that
we're using the ATTR file for FS inputs, there's no point in having a
magic opcode for this.

v2 (Jason Ekstrand):
 - Break the bit which removes the CINTERP opcode into its own patch

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-29 15:44:50 -07:00
Francisco Jerez
39de901a96 intel/fs: Use the ATTR file for FS inputs
This replaces the special magic opcodes which implicitly read inputs
with explicit use of the ATTR file.

v2 (Jason Ekstrand):
 - Break into multiple patches
 - Change the units of the FS ATTR to be in logical scalars

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-29 15:44:50 -07:00
Francisco Jerez
4bfa2ac2ea intel/fs: Rename a local variable so it doesn't shadow component()
v2 (Jason Ekstrand):
 - Break the refactor into its own patch

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-29 15:44:50 -07:00
Iago Toral Quiroga
5a12bdac09 i965/compiler: handle conversion to smaller type in the lowering pass for that
This rollbacks the revert of this same patch introduced in
commit 7b9c15628a.

And also squahes the following patch to prevent a piglit regression caused
by this change:

intel/compiler: Fix lower_conversions for 8-bit types.
Author: Jose Maria Casanova Crespo <jmcasanova@igalia.com>

For 8-bit types the execution type is word. A byte raw MOV has 16-bit
execution type and 8-bit destination and it shouldn't be considered
a conversion case. So there is no need to change alignment and enter
in lower_conversions for these instructions.

Fixes a regresion in the piglit test "glsl-fs-shader-stencil-export"
that is introduced with this patch from the Vulkan shaderInt16 series:
'i965/compiler: handle conversion to smaller type in the lowering
pass for that'. The problem is caused because there is already a case
in the driver that injects Byte instructions like this:

mov(8)          g127<1>UB       g2<32,8,4>UB

And the aforementioned pass was not accounting for the special
handling of the execution size of Byte instructions. This patch
fixes this.

v2: (Jason Ekstrand)
   - Simplify is_byte_raw_mov, include reference to PRM and not
   consider B <-> UB conversions as raw movs.

v3: (Matt Turner)
   - Indentation style fixes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106393
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-05 12:41:02 +02:00
Iago Toral Quiroga
a75f967388 intel/compiler: handle 16-bit to 64-bit conversions in BSW platforms
These are subject to the general restriction that anything that is converted
to 64-bit needs to be aligned to 64-bit.  We had this already in place for
32-bit to 64-bit conversions, so this patch generalizes the implementation
to take effect on any conversion to 64-bit from a source smaller than
64-bit.

Fixes assembly validation errors in the following CTS tests in BSW:
dEQP-VK.spirv_assembly.instruction.compute.sconvert.int16_to_int64
dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint16_to_uint64
dEQP-VK.spirv_assembly.instruction.compute.sconvert.int16_to_uint64

Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-05 12:26:37 +02:00
Mark Janes
7b9c15628a Revert "i965/compiler: handle conversion to smaller type in the lowering pass for that"
This reverts commit 96b5153790.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106393
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-05-03 15:26:59 -07:00
Iago Toral Quiroga
dd41630d9a intel/compiler: implement 16-bit pack/unpack opcodes
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:26 +02:00
Iago Toral Quiroga
6318808a05 intel/compiler: fix 16-bit comparisons
NIR assumes that booleans are always 32-bit, but Intel hardware produces
16-bit booleans for 16-bit comparisons. This means that we need to convert
the 16-bit result to 32-bit.

In the future we want to add an optimization pass to clean this up and
hopefully remove the conversions.

v2 (Jason): use the type of the source for the temporary and use
            brw_reg_type_from_bit_size for the conversion to 32-bit.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Jose Maria Casanova Crespo
e5fc3c0717 intel/compiler: implement nir_instr_type_load_const for 16-bit constants
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Iago Toral Quiroga
939501c8ed intel/compiler: implement conversions from 16-bit int/float to bool
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Iago Toral Quiroga
d5a419176f intel/compiler: implement conversion between float/int 16-bit types
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Iago Toral Quiroga
96b5153790 i965/compiler: handle conversion to smaller type in the lowering pass for that
The lowering pass was specialized to act on 64-bit to 32-bit conversions only,
but the implementation is valid for other cases.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Iago Toral Quiroga
5361a87ee7 intel/compiler: fix isign for 16-bit integers
We need to use 16-bit constants with 16-bit instructions,
otherwise we get the following validation error:

"Destination stride must be equal to the ratio of the sizes of
 the execution data type to the destination type"

Because the execution data type is 4B due to the 32-bit integer
constant.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Antia Puentes
3a1df14a7b intel: activate the gl_BaseVertex lowering
Surplus code related to the basevertex is removed.

The Vertex Elements contain now:
* VE 1: <firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <DrawID, is_indexed_draw, 0, 0>

Also fixes unreachable message.

Fixes OpenGL CTS tests:
* KHR-GL46.shader_draw_parameters_tests.ShaderDrawArraysInstancedParameters
* KHR-GL46.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters
* KHR-GL46.shader_draw_parameters_tests.MultiDrawArraysIndirectCountParameters
* KHR-GL46.shader_draw_parameters_tests.ShaderDrawArraysParameters
* KHR-GL46.shader_draw_parameters_tests.ShaderMultiDrawArraysIndirectParameters

Fixes Piglit tests:
* arb_shader_draw_parameters-drawid-indirect baseinstance
* arb_shader_draw_parameters-basevertex

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102678
2018-05-02 11:24:46 +02:00
Antia Puentes
0cbf29fa55 intel: emit is_indexed_draw in the same VE than gl_DrawID
The Vertex Elements are now:
* VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <DrawID, is-indexed-draw, 0, 0>

VE1 is it kept as it was before, VE2 additionally contains the new
system value.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-02 11:23:34 +02:00
Jose Maria Casanova Crespo
eb96bd57c7 i965/fs: retype offset_reg to UD at load_ssbo
All operations with offset_reg at do_vector_read are done
with UD type. So copy propagation was not working through
the generated MOVs:

mov(8) vgrf9:UD, vgrf7:D

This change allows removing the MOV generated for reading the
first components for 16-bit and 64-bit ssbo reads with
non-constant offsets.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-04-20 13:30:12 +02:00
Antia Puentes
c32e1035cb intel: Handle firstvertex in an identical way to BaseVertex
Until we set gl_BaseVertex to zero for non-indexed draw calls
both have an identical value.

The Vertex Elements are kept like that:
* VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <Draw ID, 0, 0, 0>

v2 (idr): Mark nir_intrinsic_load_first_vertex as "unreachable" in
emit_system_values_block and fs_visitor::nir_emit_vs_intrinsic.
2018-04-19 15:57:45 -07:00
Rob Clark
51888bf07d nir+drivers: add helpers to get # of src/dest components
Add helpers to get the number of src/dest components for an intrinsic,
and update spots that were open-coding this logic to use the helpers
instead.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-04-03 06:08:56 -04:00
Jason Ekstrand
7e38f49a8f intel/fs: Don't emit a des copy for image ops with has_dest == false
This was causing us to walk dest_components times over a thing with no
destination.  This happened to work because all of the image intrinsics
without a destination also happened to have dest_components == 0.  We
shouldn't be reading dest_components if has_dest == false.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-03-27 18:18:21 -07:00
Jason Ekstrand
884d27bcf6 nir: Rename image intrinsics to image_var
Generated with

git grep -l nir_intrinsic_image | xargs \
sed -i 's/nir_intrinsic_image/nir_intrinsic_image_var/g'

and some manual fixing in nir_intrinsics.h

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-03-23 13:48:11 +11:00
Jason Ekstrand
8b4a5e641b intel/fs: Add support for subgroup quad operations
NIR has code to lower these away for us but we can do significantly
better in many cases with register regioning and SIMD4x2.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-03-07 12:13:47 -08:00
Jason Ekstrand
2292b20b29 intel/fs: Implement reduce and scan opeprations
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-03-07 12:13:47 -08:00
Jason Ekstrand
90c9f29518 i965/fs: Add support for nir_intrinsic_shuffle
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-03-07 12:13:47 -08:00
Jason Ekstrand
7cfece820d i965/fs: Support nir_intrinsic_vote_feq
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-03-07 12:13:47 -08:00
Jason Ekstrand
44681e4795 nir: Generalize nir_intrinsic_vote_eq
The SPIR-V extension wants us to be able to do an AllEqual on any vector
or scalar type.  This has two implications:

 1) We need to be able to handle vectors so we switch the vote_eq
    intrinsics to be vectorized intrinsics.

 2) We need to handle floats which have different behavior with respect
    to +-0, NaN, etc. than the integer variant so we need two variants.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-03-07 12:13:47 -08:00
Jason Ekstrand
974daec495 i965/fs: Implement basic SPIR-V subgroup intrinsics
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-03-07 12:13:47 -08:00
Francisco Jerez
4b4838b1ae Revert "i965/fs: Predicate byte scattered writes if needed"
This reverts commit a4031bdfa9.  It's
redundant with the sample mask predication done at this point by the
common logical send lowering infrastructure, and rather buggy because
it wasn't applying the correct sample mask in shaders using discard,
since the dispatch mask returned by FS_OPCODE_MOV_DISPATCH_TO_FLAGS
doesn't reflect samples discarded by the shader, so it could have led
to data corruption in fragment shader invocations that execute discard
based on a non-dynamically uniform condition.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-03-02 11:28:56 -08:00
Jose Maria Casanova Crespo
02266f9ba1 spirv/i965/anv: Relax push constant offset assertions being 32-bit aligned
The introduction of 16-bit types with VK_KHR_16bit_storages implies that
push constant offsets could be multiple of 2-bytes. Some assertions are
updated so offsets should be just multiple of size of the base type but
in some cases we can not assume it as doubles aren't aligned to 8 bytes
in some cases.

For 16-bit types, the push constant offset takes into account the
internal offset in the 32-bit uniform bucket adding 2-bytes when we access
not 32-bit aligned elements. In all 32-bit aligned cases it just becomes 0.

v2: Assert offsets to be aligned to the dest type size. (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-28 21:37:40 -08:00
Jose Maria Casanova Crespo
69be3a82ca i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layout
Restrict the use of untyped_surface_write with 16-bit pairs in
ssbo to the cases where we can guarantee that offset is multiple
of 4.

Taking into account that VK_KHR_relaxed_block_layout is available
in ANV we can only guarantee that when we have a constant offset
that is multiple of 4. For non constant offsets we will always use
byte_scattered_write.

v2: (Jason Ekstrand)
    - Assert offset_reg to be multiple of 4 if it is immediate.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-28 21:37:40 -08:00
Jose Maria Casanova Crespo
8dd8be0323 i965/fs: Support 16-bit do_read_vector with VK_KHR_relaxed_block_layout
16-bit load_ubo/ssbo operations that call do_untyped_read_vector don't
guarantee that offsets are multiple of 4-bytes as required by untyped_read
message. This happens for example in the case of f16mat3x3 when then
VK_KHR_relaxed_block_layout is enabled.

Vectors reads when we have non-constant offsets are implemented with
multiple byte_scattered_read messages that not require 32-bit aligned offsets.

Now for all constant offsets we can use the untyped_read_surface message.
In the case of constant offsets not aligned to 32-bits, we calculate a
start offset 32-bit aligned and use the shuffle_32bit_load_result_to_16bit_data
function and the first_component parameter to skip the copy of the unneeded
component.

v2: (Jason Ekstrand)
    Use untyped_read_surface messages always we have constant offsets.

v3: (Jason Ekstrand)
    Simplify loop for reads with non constant offsets.
    Use end - start to calculate the number of 32-bit components to read with
    constant offsets.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-28 21:37:40 -08:00
Jose Maria Casanova Crespo
2dd94f462b i965/fs: shuffle_32bit_load_result_to_16bit_data now skips components
This helper used to load 16bit components from 32-bits read now allows
skipping components with the new parameter first_component. The semantics
now skip components until we reach the first_component, and then reads the
number of components passed to the function.

All previous uses of the helper are updated to use 0 as first_component.
This will allow read 16-bit components when the first one is not aligned
32-bit. Enabling more usages of untyped_reads with 16-bit types.

v2: (Jason Ektrand)
    Change parameters order to first_component, num_components

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-28 21:37:40 -08:00
Jose Maria Casanova Crespo
67d7dd594e isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit
The surfaces that backup the GPU buffers have a boundary check that
considers that access to partial dwords are considered out-of-bounds.
For example, buffers with 1,3 16-bit elements has size 2 or 6 and the
last two bytes would always be read as 0 or its writting ignored.

The introduction of 16-bit types implies that we need to align the size
to 4-bytew multiples so that partial dwords could be read/written.
Adding an inconditional +2 size to buffers not being multiple of 2
solves this issue for the general cases of UBO or SSBO.

But, when unsized arrays of 16-bit elements are used it is not possible
to know if the size was padded or not. To solve this issue the
implementation calculates the needed size of the buffer surfaces,
as suggested by Jason:

surface_size = isl_align(buffer_size, 4) +
               (isl_align(buffer_size, 4) - buffer_size)

So when we calculate backwards the buffer_size in the backend we
update the resinfo return value with:

buffer_size = (surface_size & ~3) - (surface_size & 3)

It is also exposed this buffer requirements when robust buffer access
is enabled so these buffer sizes recommend being multiple of 4.

v2: (Jason Ekstrand)
    Move padding logic fron anv to isl_surface_state.
    Move calculus of original size from spirv to driver backend.
v3: (Jason Ekstrand)
    Rename some variables and use a similar expresion when calculating.
    padding than when obtaining the original buffer size.
    Avoid use of unnecesary component call at brw_fs_nir.
v4: (Jason Ekstrand)
    Complete comment with buffer size calculus explanation in brw_fs_nir.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-28 21:37:40 -08:00
Iago Toral Quiroga
cb9dbd6dec i965/compiler: clean up nir_intrinsic_load_input for vertex shaders
This code to re-set the type of the source and destination is not
necessary since we never manipulate the types. Looks like a
left over from a time where we had to retype to float temporarily
to handle 64-bit inputs.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-02-14 12:00:14 +01:00
Iago Toral Quiroga
4917d38321 intel/compiler: fix first_component for 64-bit types on vertex inputs
Divide it by two as we do for other stages. This is because the
component layout qualifier is always in 32-bit units.

Fixes issues in a new CTS test (still WIP):
KHR-GL45.enhanced_layouts.varying_double_components

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-02-14 12:00:14 +01:00
Jason Ekstrand
3d2b157e23 i965/fs: Use UW types when using V immediates
Gen 10 has a strange hardware bug involving V immediates with W types.
It appears that a mov(8) g2<1>W 0x76543210V will actually result in g2
getting the value {3, 2, 1, 0, 3, 2, 1, 0}.  In particular, the bottom
four nibbles are repeated instead of the top four being taken.  (A mov
of 0x00003210V yields the same result.)  This bug does not appear in any
hardware documentation as far as we can tell and the simulator does not
implement the bug either.

Commit 6132992cdb was mostly a no-op
except that it changed the type of the subgroup invocation from UW to W
and caused us to tickle this bug with basically every compute shader
that uses any sort of invocation ID (which is most of them).  This is
also potentially an issue for geometry shader input pulls and SampleID
setup.  The easy solution is just to change the few places where we use
a vector integer immediate with a W type to use a UW type.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Fixes: 6132992cdb
2018-01-11 14:31:38 -08:00
Kenneth Graunke
a1afef8de0 i965: Combine {VS,FS}_OPCODE_GET_BUFFER_SIZE opcodes.
These are the same, we don't need a separate opcode enum per backend.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-30 20:30:34 -08:00
Jose Maria Casanova Crespo
a1e257a5bf i965/fs: Use untyped_surface_read for 16-bit load_ssbo
SSBO loads were using byte_scattered read messages as they allow
reading 16-bit size components. byte_scattered messages can only
operate one component at a time so we needed to emit as many messages
as components.

But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the
untyped_surface_read message to read pairs of 16-bit components using only
one message. Once each pair is read it is unshuffled to return the proper
16-bit components. vec3 case is assimilated to vec4 but the 4th component
is ignored.

16-bit scalars are read using one byte_scattered_read message.

v2: Removed use of stride = 2 on sources (Jason Ekstrand)
    Rework optimization using unshuffle 16 reads (Chema Casanova)
v3: Use W and D types insead of HF and F in shuffle to avoid rounding
    erros (Jason Ekstrand)
    Use untyped_surface_read for 16-bit vec3. (Jason Ekstrand)
v4: Use subscript insead of chaging type and stride  (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo
ce2e572c4c i965/fs: Optimize 16-bit SSBO stores by packing two into a 32-bit reg
Currently, we use byte-scattered write messages for storing 16-bit
into an SSBO. This is because untyped surface messages have a fixed
32-bit size.

This patch optimizes these 16-bit writes by combining 2 values (e.g,
two consecutive components aligned with 32-bits) into a 32-bit register,
packing the two 16-bit words.

16-bit single component values will continue to use byte-scattered
write messages. The same will happens when the first consecutive
component is not aligned 32-bits.

This optimization reduces the number of SEND messages used for storing
16-bit values potentially by 2 or 4, which cuts down execution time
significantly because byte-scattered writes are an expensive
operation as they only write a component for message.

v2: Removed use of stride = 2 on sources (Jason Ekstrand)
    Rework optimization using shuffle 16 write and enable writes
    of 16bit vec4 with only one message of 32-bits. (Chema Casanova)
v3: - Fix coding style (Eduardo Lima)
    - Reorganize code to avoid duplication. (Jason Ekstrand)
    - Include new comments to explain the length calculations to
      fix alignment issues of components. (Jason Ekstrand)
    - Fix issues with writemask yz with 16-bit writes. (Jason Ektrand)
v4: (Jason Ekstrand)
    - Reorganize 64-bit ssbo-writes to avoid using slots_per_component.
    - Comment about why suffle is needed when using byte_scattered_write.

Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo
3db31c0b06 i965/fs: Helpers for un/shuffle 16-bit pairs in 32-bit components
This helpers are used to load/store 16-bit types from/to 32-bit
components.

The functions shuffle_32bit_load_result_to_16bit_data and
shuffle_16bit_data_for_32bit_write are implemented in a similar
way than the analogous functions for handling 64-bit types.

v1: Explain need of temporary in shuffle operations. (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo
fa4a9d63bb i965/fs: Use byte scattered read for 16-bit load_ssbo
Used to enable 16-bit reads at do_untyped_vector_read, that is used on
the following intrinsics:

   * nir_intrinsic_load_shared
   * nir_intrinsic_load_ssbo

v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand)

v3: - Add bitsize to scattered read operation (Jason Ekstrand)
    - Remove implementation of 16-bit UBO read from this patch.
    - Avoid assertion at opt_algebraic caused by ADD of two IMM with
      offset with BRW_REGISTER_TYPE_UD type found on matrix tests.
      (Jose Maria Casanova)
v4: (Jason Ekstrand)
    - Put if case for 16-bits at the beginning of the if ladder.
    - Use type_sz(dest.type) * 8 as bit_size parameter for scattered read.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Alejandro Piñeiro
a4031bdfa9 i965/fs: Predicate byte scattered writes if needed
While on Untyped Surface messages the bits of the execution mask are
ANDed with the corresponding bits of the Pixel/Sample Mask, that is
not the case for byte scattered writes. That is needed to avoid ssbo
stores writing on helper invocations. So when that can affect, we load
the sample mask, and predicate the send message.

Note: the need for this patch was tested with a custom test. Right now
the 16 bit storage CTS tests doesnt need this path in order to get a
full pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Alejandro Piñeiro
96f1926aab i965/fs: Use byte_scattered_write on 16-bit store_ssbo
We need to rely on byte scattered writes as untyped writes are 32-bit
size. We could try to keep using 32-bit messages when we have two or
four 16-bit elements, but for simplicity sake, we use the same message
for any component number. We revisit this aproach in the follwing
patches.

v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand)

v3: (Jason Ekstrand)
    - Include bit_size to scattered write message and remove namespace
    - specific for scattered messages.
    - Move comment to proper place.
    - Squashed with i965/fs: Adjust type_size/type_slots on store_ssbo.
    (Jose Maria Casanova)
    - Take into account that get_nir_src returns now WORD types for
      16-bit sources instead of DWORD.
v4: (Jason Ekstrand)
    - Rename lenght variable to num_components.
    - Include assertions before emit_untyped_write.
    - Remove type_slot in favor of num_slot and first_slot.

Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Alejandro Piñeiro
82fa4d45e7 i965/fs: Enable rounding mode on f2f16 ops
By default we don't set the rounding mode. We only set
round-to-near-even or round-to-zero mode if explicitly set from nir.

v2: Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate
    with the rounding mode (Curro)

v3: Use new helper brw_rnd_mode_from_nir_op  (Jason Ekstrand)

Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Alejandro Piñeiro
5d5ee507fb i965/fs: Handle 32-bit to 16-bit conversions
Conversions to 16-bit need having aligment between the 16-bit
and 32-bit types. So the conversion operations unpack 16-bit types
to with an stride=2 and then applies a MOV with the conversion.

v2 (Jason Ekstrand):
  - Avoid the general use of stride=2 for 16-bit register types.

v3 (Topi Pohjolainen)
  - Code style fix
   (Jason Ekstrand)
  - Now nir_op_f2f16 was renamed to nir_op_f2f16_undef
    because conversion to f16 with undefined rounding is explicit

Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Kenneth Graunke
ff964916dc i965: Use nir_lower_atomics_to_ssbos and delete ABO compiler code.
We use the same hardware mechanism for both atomic counters and SSBO
atomics, so there's really no benefit to maintaining separate code to
handle each case.  Instead, we can just use Rob's shiny new NIR pass to
convert atomic_uints to SSBOs, and delete piles of code.

The ssbo_start section of the binding table becomes a combined ABO and
SSBO section, with ABOs first, then SSBOs.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-11-15 09:37:32 -08:00
Matt Turner
6ac2d16901 i965/fs: Fix extract_i8/u8 to a 64-bit destination
The MOV instruction can extract bytes to words/double words, and
words/double words to quadwords, but not byte to quadwords.

For unsigned byte to quadword, we can read them as words and AND off the
high byte and extract to quadword in one instruction. For signed bytes,
we need to first sign extend to word and the sign extend that word to a
quadword.

Fixes the following test on CHV, BXT, and GLK:
   KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103628
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-11-14 10:56:18 -08:00
Matt Turner
cfcfa0b9cd i965/fs: Split all 32->64-bit MOVs on CHV, BXT, GLK
Fixes the following tests on CHV, BXT, and GLK:
    KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot
    dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint32_to_int64
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103115
2017-11-14 10:56:18 -08:00
Jason Ekstrand
d002950e54 intel/fs/nir: Return Q types from brw_reg_type_for_bit_size
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-11-07 10:41:24 -08:00
Jason Ekstrand
dee58ecd2e intel/fs/nir: Use Q immediates for load_const on gen8+
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-11-07 10:41:24 -08:00