fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 18:00:13 +01:00

Author	SHA1	Message	Date
Jose Maria Casanova Crespo	20e4732f7d	intel/fs: Use shuffle_from_32bit_read to read 16-bit SSBO Using shuffle_from_32bit_read instead of 16-bit shuffle functions avoids the need of retype. At the same time new function are ready for 8-bit type SSBO reads. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	22c654941b	intel/fs: New shuffle_for_32bit_write and shuffle_from_32bit_read These new shuffle functions deal with the shuffle/unshuffle operations needed for read/write operations using 32-bit components when the read/written components have a different bit-size (8, 16, 64-bits). Shuffle from 32-bit to 32-bit becomes a simple MOV. shuffle_src_to_dst takes care of doing a shuffle when source type is smaller than destination type and an unshuffle when source type is bigger than destination. So this new read/write functions just need to call shuffle_src_to_dst assuming that writes use a 32-bit destination and reads use a 32-bit source. As shuffle_for_32bit_write/from_32bit_read components take components in unit of source/destination types and shuffle_src_to_dst takes units of the smallest type component, we adjust components and first_component parameters. To enable this new functions it is needed than there is no source/destination overlap in the case of shuffle_from_32bit_read. That never happens on shuffle_for_32bit_write as it allocates a new destination register as it was at shuffle_64bit_data_for_32bit_write. v2: Reword commit log and add comments to explain why first_component and components parameters are adjusted. (Jason Ekstrand) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	a5665056e5	intel/fs: general 8/16/32/64-bit shuffle_src_to_dst function This new function takes care of shuffle/unshuffle components of a particular bit-size in components with a different bit-size. If source type size is smaller than destination type size the operation needed is a component shuffle. The opposite case would be an unshuffle. Component units are measured in terms of the smaller type between source and destination. As we are un/shuffling the smaller components from/into a bigger one. The operation allows to skip first_component number of components from the source. Shuffle MOVs are retyped using integer types avoiding problems with denorms and float types if source and destination bitsize is different. This allows to simplify uses of shuffle functions that are dealing with these retypes individually. Now there is a new restriction so source and destination can not overlap anymore when calling this shuffle function. Following patches that migrate to use this new function will take care individually of avoiding source and destination overlaps. v2: (Jason Ekstrand) - Rewrite overlap asserts. - Manage type_sz(src.type) == type_sz(dst.type) case using MOVs from source to dest. This works for 64-bit to 64-bits operation that on Gen7 as it doesn't support Q registers. - Explain that components units are based in the smallest type. v3: - Fix unshuffle overlap assert (Jason Ekstrand) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Plamena Manolova	939312702e	i965: Add ARB_fragment_shader_interlock support. Adds suppport for ARB_fragment_shader_interlock. We achieve the interlock and fragment ordering by issuing a memory fence via sendc. Signed-off-by: Plamena Manolova <plamena.manolova@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2018-06-01 16:36:39 +01:00
Francisco Jerez	d3cd6b7215	intel/fs: Replace the CINTERP opcode with a simple MOV The only reason it was it's own opcode was so that we could detect it and adjust the source register based on the payload setup. Now that we're using the ATTR file for FS inputs, there's no point in having a magic opcode for this. v2 (Jason Ekstrand): - Break the bit which removes the CINTERP opcode into its own patch Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-05-29 15:44:50 -07:00
Francisco Jerez	39de901a96	intel/fs: Use the ATTR file for FS inputs This replaces the special magic opcodes which implicitly read inputs with explicit use of the ATTR file. v2 (Jason Ekstrand): - Break into multiple patches - Change the units of the FS ATTR to be in logical scalars Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-05-29 15:44:50 -07:00
Francisco Jerez	4bfa2ac2ea	intel/fs: Rename a local variable so it doesn't shadow component() v2 (Jason Ekstrand): - Break the refactor into its own patch Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-05-29 15:44:50 -07:00
Iago Toral Quiroga	5a12bdac09	i965/compiler: handle conversion to smaller type in the lowering pass for that This rollbacks the revert of this same patch introduced in commit `7b9c15628a`. And also squahes the following patch to prevent a piglit regression caused by this change: intel/compiler: Fix lower_conversions for 8-bit types. Author: Jose Maria Casanova Crespo <jmcasanova@igalia.com> For 8-bit types the execution type is word. A byte raw MOV has 16-bit execution type and 8-bit destination and it shouldn't be considered a conversion case. So there is no need to change alignment and enter in lower_conversions for these instructions. Fixes a regresion in the piglit test "glsl-fs-shader-stencil-export" that is introduced with this patch from the Vulkan shaderInt16 series: 'i965/compiler: handle conversion to smaller type in the lowering pass for that'. The problem is caused because there is already a case in the driver that injects Byte instructions like this: mov(8) g127<1>UB g2<32,8,4>UB And the aforementioned pass was not accounting for the special handling of the execution size of Byte instructions. This patch fixes this. v2: (Jason Ekstrand) - Simplify is_byte_raw_mov, include reference to PRM and not consider B <-> UB conversions as raw movs. v3: (Matt Turner) - Indentation style fixes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106393 Tested-by: Mark Janes <mark.a.janes@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-05-05 12:41:02 +02:00
Iago Toral Quiroga	a75f967388	intel/compiler: handle 16-bit to 64-bit conversions in BSW platforms These are subject to the general restriction that anything that is converted to 64-bit needs to be aligned to 64-bit. We had this already in place for 32-bit to 64-bit conversions, so this patch generalizes the implementation to take effect on any conversion to 64-bit from a source smaller than 64-bit. Fixes assembly validation errors in the following CTS tests in BSW: dEQP-VK.spirv_assembly.instruction.compute.sconvert.int16_to_int64 dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint16_to_uint64 dEQP-VK.spirv_assembly.instruction.compute.sconvert.int16_to_uint64 Tested-by: Mark Janes <mark.a.janes@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-05-05 12:26:37 +02:00
Mark Janes	7b9c15628a	Revert "i965/compiler: handle conversion to smaller type in the lowering pass for that" This reverts commit `96b5153790`. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106393 Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2018-05-03 15:26:59 -07:00
Iago Toral Quiroga	dd41630d9a	intel/compiler: implement 16-bit pack/unpack opcodes Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-05-03 11:40:26 +02:00
Iago Toral Quiroga	6318808a05	intel/compiler: fix 16-bit comparisons NIR assumes that booleans are always 32-bit, but Intel hardware produces 16-bit booleans for 16-bit comparisons. This means that we need to convert the 16-bit result to 32-bit. In the future we want to add an optimization pass to clean this up and hopefully remove the conversions. v2 (Jason): use the type of the source for the temporary and use brw_reg_type_from_bit_size for the conversion to 32-bit. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-05-03 11:40:25 +02:00
Jose Maria Casanova Crespo	e5fc3c0717	intel/compiler: implement nir_instr_type_load_const for 16-bit constants Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-05-03 11:40:25 +02:00
Iago Toral Quiroga	939501c8ed	intel/compiler: implement conversions from 16-bit int/float to bool Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-05-03 11:40:25 +02:00
Iago Toral Quiroga	d5a419176f	intel/compiler: implement conversion between float/int 16-bit types Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-05-03 11:40:25 +02:00
Iago Toral Quiroga	96b5153790	i965/compiler: handle conversion to smaller type in the lowering pass for that The lowering pass was specialized to act on 64-bit to 32-bit conversions only, but the implementation is valid for other cases. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-05-03 11:40:25 +02:00
Iago Toral Quiroga	5361a87ee7	intel/compiler: fix isign for 16-bit integers We need to use 16-bit constants with 16-bit instructions, otherwise we get the following validation error: "Destination stride must be equal to the ratio of the sizes of the execution data type to the destination type" Because the execution data type is 4B due to the 32-bit integer constant. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-05-03 11:40:25 +02:00
Antia Puentes	3a1df14a7b	intel: activate the gl_BaseVertex lowering Surplus code related to the basevertex is removed. The Vertex Elements contain now: * VE 1: <firstvertex, BaseInstance, VertexID, InstanceID> * VE 2: <DrawID, is_indexed_draw, 0, 0> Also fixes unreachable message. Fixes OpenGL CTS tests: * KHR-GL46.shader_draw_parameters_tests.ShaderDrawArraysInstancedParameters * KHR-GL46.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters * KHR-GL46.shader_draw_parameters_tests.MultiDrawArraysIndirectCountParameters * KHR-GL46.shader_draw_parameters_tests.ShaderDrawArraysParameters * KHR-GL46.shader_draw_parameters_tests.ShaderMultiDrawArraysIndirectParameters Fixes Piglit tests: * arb_shader_draw_parameters-drawid-indirect baseinstance * arb_shader_draw_parameters-basevertex Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102678	2018-05-02 11:24:46 +02:00
Antia Puentes	0cbf29fa55	intel: emit is_indexed_draw in the same VE than gl_DrawID The Vertex Elements are now: * VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID> * VE 2: <DrawID, is-indexed-draw, 0, 0> VE1 is it kept as it was before, VE2 additionally contains the new system value. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-05-02 11:23:34 +02:00
Jose Maria Casanova Crespo	eb96bd57c7	i965/fs: retype offset_reg to UD at load_ssbo All operations with offset_reg at do_vector_read are done with UD type. So copy propagation was not working through the generated MOVs: mov(8) vgrf9:UD, vgrf7:D This change allows removing the MOV generated for reading the first components for 16-bit and 64-bit ssbo reads with non-constant offsets. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-04-20 13:30:12 +02:00
Antia Puentes	c32e1035cb	intel: Handle firstvertex in an identical way to BaseVertex Until we set gl_BaseVertex to zero for non-indexed draw calls both have an identical value. The Vertex Elements are kept like that: * VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID> * VE 2: <Draw ID, 0, 0, 0> v2 (idr): Mark nir_intrinsic_load_first_vertex as "unreachable" in emit_system_values_block and fs_visitor::nir_emit_vs_intrinsic.	2018-04-19 15:57:45 -07:00
Rob Clark	51888bf07d	nir+drivers: add helpers to get # of src/dest components Add helpers to get the number of src/dest components for an intrinsic, and update spots that were open-coding this logic to use the helpers instead. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-04-03 06:08:56 -04:00
Jason Ekstrand	7e38f49a8f	intel/fs: Don't emit a des copy for image ops with has_dest == false This was causing us to walk dest_components times over a thing with no destination. This happened to work because all of the image intrinsics without a destination also happened to have dest_components == 0. We shouldn't be reading dest_components if has_dest == false. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-03-27 18:18:21 -07:00
Jason Ekstrand	884d27bcf6	nir: Rename image intrinsics to image_var Generated with git grep -l nir_intrinsic_image \| xargs \ sed -i 's/nir_intrinsic_image/nir_intrinsic_image_var/g' and some manual fixing in nir_intrinsics.h Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-03-23 13:48:11 +11:00
Jason Ekstrand	8b4a5e641b	intel/fs: Add support for subgroup quad operations NIR has code to lower these away for us but we can do significantly better in many cases with register regioning and SIMD4x2. Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-03-07 12:13:47 -08:00
Jason Ekstrand	2292b20b29	intel/fs: Implement reduce and scan opeprations Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-03-07 12:13:47 -08:00
Jason Ekstrand	90c9f29518	i965/fs: Add support for nir_intrinsic_shuffle Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-03-07 12:13:47 -08:00
Jason Ekstrand	7cfece820d	i965/fs: Support nir_intrinsic_vote_feq Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-03-07 12:13:47 -08:00
Jason Ekstrand	44681e4795	nir: Generalize nir_intrinsic_vote_eq The SPIR-V extension wants us to be able to do an AllEqual on any vector or scalar type. This has two implications: 1) We need to be able to handle vectors so we switch the vote_eq intrinsics to be vectorized intrinsics. 2) We need to handle floats which have different behavior with respect to +-0, NaN, etc. than the integer variant so we need two variants. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-03-07 12:13:47 -08:00
Jason Ekstrand	974daec495	i965/fs: Implement basic SPIR-V subgroup intrinsics Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-03-07 12:13:47 -08:00
Francisco Jerez	4b4838b1ae	Revert "i965/fs: Predicate byte scattered writes if needed" This reverts commit `a4031bdfa9`. It's redundant with the sample mask predication done at this point by the common logical send lowering infrastructure, and rather buggy because it wasn't applying the correct sample mask in shaders using discard, since the dispatch mask returned by FS_OPCODE_MOV_DISPATCH_TO_FLAGS doesn't reflect samples discarded by the shader, so it could have led to data corruption in fragment shader invocations that execute discard based on a non-dynamically uniform condition. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-03-02 11:28:56 -08:00
Jose Maria Casanova Crespo	02266f9ba1	spirv/i965/anv: Relax push constant offset assertions being 32-bit aligned The introduction of 16-bit types with VK_KHR_16bit_storages implies that push constant offsets could be multiple of 2-bytes. Some assertions are updated so offsets should be just multiple of size of the base type but in some cases we can not assume it as doubles aren't aligned to 8 bytes in some cases. For 16-bit types, the push constant offset takes into account the internal offset in the 32-bit uniform bucket adding 2-bytes when we access not 32-bit aligned elements. In all 32-bit aligned cases it just becomes 0. v2: Assert offsets to be aligned to the dest type size. (Jason Ekstrand) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-28 21:37:40 -08:00
Jose Maria Casanova Crespo	69be3a82ca	i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layout Restrict the use of untyped_surface_write with 16-bit pairs in ssbo to the cases where we can guarantee that offset is multiple of 4. Taking into account that VK_KHR_relaxed_block_layout is available in ANV we can only guarantee that when we have a constant offset that is multiple of 4. For non constant offsets we will always use byte_scattered_write. v2: (Jason Ekstrand) - Assert offset_reg to be multiple of 4 if it is immediate. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-28 21:37:40 -08:00
Jose Maria Casanova Crespo	8dd8be0323	i965/fs: Support 16-bit do_read_vector with VK_KHR_relaxed_block_layout 16-bit load_ubo/ssbo operations that call do_untyped_read_vector don't guarantee that offsets are multiple of 4-bytes as required by untyped_read message. This happens for example in the case of f16mat3x3 when then VK_KHR_relaxed_block_layout is enabled. Vectors reads when we have non-constant offsets are implemented with multiple byte_scattered_read messages that not require 32-bit aligned offsets. Now for all constant offsets we can use the untyped_read_surface message. In the case of constant offsets not aligned to 32-bits, we calculate a start offset 32-bit aligned and use the shuffle_32bit_load_result_to_16bit_data function and the first_component parameter to skip the copy of the unneeded component. v2: (Jason Ekstrand) Use untyped_read_surface messages always we have constant offsets. v3: (Jason Ekstrand) Simplify loop for reads with non constant offsets. Use end - start to calculate the number of 32-bit components to read with constant offsets. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-28 21:37:40 -08:00
Jose Maria Casanova Crespo	2dd94f462b	i965/fs: shuffle_32bit_load_result_to_16bit_data now skips components This helper used to load 16bit components from 32-bits read now allows skipping components with the new parameter first_component. The semantics now skip components until we reach the first_component, and then reads the number of components passed to the function. All previous uses of the helper are updated to use 0 as first_component. This will allow read 16-bit components when the first one is not aligned 32-bit. Enabling more usages of untyped_reads with 16-bit types. v2: (Jason Ektrand) Change parameters order to first_component, num_components Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-28 21:37:40 -08:00
Jose Maria Casanova Crespo	67d7dd594e	isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit The surfaces that backup the GPU buffers have a boundary check that considers that access to partial dwords are considered out-of-bounds. For example, buffers with 1,3 16-bit elements has size 2 or 6 and the last two bytes would always be read as 0 or its writting ignored. The introduction of 16-bit types implies that we need to align the size to 4-bytew multiples so that partial dwords could be read/written. Adding an inconditional +2 size to buffers not being multiple of 2 solves this issue for the general cases of UBO or SSBO. But, when unsized arrays of 16-bit elements are used it is not possible to know if the size was padded or not. To solve this issue the implementation calculates the needed size of the buffer surfaces, as suggested by Jason: surface_size = isl_align(buffer_size, 4) + (isl_align(buffer_size, 4) - buffer_size) So when we calculate backwards the buffer_size in the backend we update the resinfo return value with: buffer_size = (surface_size & ~3) - (surface_size & 3) It is also exposed this buffer requirements when robust buffer access is enabled so these buffer sizes recommend being multiple of 4. v2: (Jason Ekstrand) Move padding logic fron anv to isl_surface_state. Move calculus of original size from spirv to driver backend. v3: (Jason Ekstrand) Rename some variables and use a similar expresion when calculating. padding than when obtaining the original buffer size. Avoid use of unnecesary component call at brw_fs_nir. v4: (Jason Ekstrand) Complete comment with buffer size calculus explanation in brw_fs_nir. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-28 21:37:40 -08:00
Iago Toral Quiroga	cb9dbd6dec	i965/compiler: clean up nir_intrinsic_load_input for vertex shaders This code to re-set the type of the source and destination is not necessary since we never manipulate the types. Looks like a left over from a time where we had to retype to float temporarily to handle 64-bit inputs. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-02-14 12:00:14 +01:00
Iago Toral Quiroga	4917d38321	intel/compiler: fix first_component for 64-bit types on vertex inputs Divide it by two as we do for other stages. This is because the component layout qualifier is always in 32-bit units. Fixes issues in a new CTS test (still WIP): KHR-GL45.enhanced_layouts.varying_double_components Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-02-14 12:00:14 +01:00
Jason Ekstrand	3d2b157e23	i965/fs: Use UW types when using V immediates Gen 10 has a strange hardware bug involving V immediates with W types. It appears that a mov(8) g2<1>W 0x76543210V will actually result in g2 getting the value {3, 2, 1, 0, 3, 2, 1, 0}. In particular, the bottom four nibbles are repeated instead of the top four being taken. (A mov of 0x00003210V yields the same result.) This bug does not appear in any hardware documentation as far as we can tell and the simulator does not implement the bug either. Commit `6132992cdb` was mostly a no-op except that it changed the type of the subgroup invocation from UW to W and caused us to tickle this bug with basically every compute shader that uses any sort of invocation ID (which is most of them). This is also potentially an issue for geometry shader input pulls and SampleID setup. The easy solution is just to change the few places where we use a vector integer immediate with a W type to use a UW type. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org Fixes: `6132992cdb`	2018-01-11 14:31:38 -08:00
Kenneth Graunke	a1afef8de0	i965: Combine {VS,FS}_OPCODE_GET_BUFFER_SIZE opcodes. These are the same, we don't need a separate opcode enum per backend. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-30 20:30:34 -08:00
Jose Maria Casanova Crespo	a1e257a5bf	i965/fs: Use untyped_surface_read for 16-bit load_ssbo SSBO loads were using byte_scattered read messages as they allow reading 16-bit size components. byte_scattered messages can only operate one component at a time so we needed to emit as many messages as components. But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the untyped_surface_read message to read pairs of 16-bit components using only one message. Once each pair is read it is unshuffled to return the proper 16-bit components. vec3 case is assimilated to vec4 but the 4th component is ignored. 16-bit scalars are read using one byte_scattered_read message. v2: Removed use of stride = 2 on sources (Jason Ekstrand) Rework optimization using unshuffle 16 reads (Chema Casanova) v3: Use W and D types insead of HF and F in shuffle to avoid rounding erros (Jason Ekstrand) Use untyped_surface_read for 16-bit vec3. (Jason Ekstrand) v4: Use subscript insead of chaging type and stride (Jason Ekstrand) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	ce2e572c4c	i965/fs: Optimize 16-bit SSBO stores by packing two into a 32-bit reg Currently, we use byte-scattered write messages for storing 16-bit into an SSBO. This is because untyped surface messages have a fixed 32-bit size. This patch optimizes these 16-bit writes by combining 2 values (e.g, two consecutive components aligned with 32-bits) into a 32-bit register, packing the two 16-bit words. 16-bit single component values will continue to use byte-scattered write messages. The same will happens when the first consecutive component is not aligned 32-bits. This optimization reduces the number of SEND messages used for storing 16-bit values potentially by 2 or 4, which cuts down execution time significantly because byte-scattered writes are an expensive operation as they only write a component for message. v2: Removed use of stride = 2 on sources (Jason Ekstrand) Rework optimization using shuffle 16 write and enable writes of 16bit vec4 with only one message of 32-bits. (Chema Casanova) v3: - Fix coding style (Eduardo Lima) - Reorganize code to avoid duplication. (Jason Ekstrand) - Include new comments to explain the length calculations to fix alignment issues of components. (Jason Ekstrand) - Fix issues with writemask yz with 16-bit writes. (Jason Ektrand) v4: (Jason Ekstrand) - Reorganize 64-bit ssbo-writes to avoid using slots_per_component. - Comment about why suffle is needed when using byte_scattered_write. Signed-off-by: Eduardo Lima <elima@igalia.com> Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	3db31c0b06	i965/fs: Helpers for un/shuffle 16-bit pairs in 32-bit components This helpers are used to load/store 16-bit types from/to 32-bit components. The functions shuffle_32bit_load_result_to_16bit_data and shuffle_16bit_data_for_32bit_write are implemented in a similar way than the analogous functions for handling 64-bit types. v1: Explain need of temporary in shuffle operations. (Jason Ekstrand) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	fa4a9d63bb	i965/fs: Use byte scattered read for 16-bit load_ssbo Used to enable 16-bit reads at do_untyped_vector_read, that is used on the following intrinsics: * nir_intrinsic_load_shared * nir_intrinsic_load_ssbo v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand) v3: - Add bitsize to scattered read operation (Jason Ekstrand) - Remove implementation of 16-bit UBO read from this patch. - Avoid assertion at opt_algebraic caused by ADD of two IMM with offset with BRW_REGISTER_TYPE_UD type found on matrix tests. (Jose Maria Casanova) v4: (Jason Ekstrand) - Put if case for 16-bits at the beginning of the if ladder. - Use type_sz(dest.type) * 8 as bit_size parameter for scattered read. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	a4031bdfa9	i965/fs: Predicate byte scattered writes if needed While on Untyped Surface messages the bits of the execution mask are ANDed with the corresponding bits of the Pixel/Sample Mask, that is not the case for byte scattered writes. That is needed to avoid ssbo stores writing on helper invocations. So when that can affect, we load the sample mask, and predicate the send message. Note: the need for this patch was tested with a custom test. Right now the 16 bit storage CTS tests doesnt need this path in order to get a full pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	96f1926aab	i965/fs: Use byte_scattered_write on 16-bit store_ssbo We need to rely on byte scattered writes as untyped writes are 32-bit size. We could try to keep using 32-bit messages when we have two or four 16-bit elements, but for simplicity sake, we use the same message for any component number. We revisit this aproach in the follwing patches. v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand) v3: (Jason Ekstrand) - Include bit_size to scattered write message and remove namespace - specific for scattered messages. - Move comment to proper place. - Squashed with i965/fs: Adjust type_size/type_slots on store_ssbo. (Jose Maria Casanova) - Take into account that get_nir_src returns now WORD types for 16-bit sources instead of DWORD. v4: (Jason Ekstrand) - Rename lenght variable to num_components. - Include assertions before emit_untyped_write. - Remove type_slot in favor of num_slot and first_slot. Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	82fa4d45e7	i965/fs: Enable rounding mode on f2f16 ops By default we don't set the rounding mode. We only set round-to-near-even or round-to-zero mode if explicitly set from nir. v2: Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate with the rounding mode (Curro) v3: Use new helper brw_rnd_mode_from_nir_op (Jason Ekstrand) Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	5d5ee507fb	i965/fs: Handle 32-bit to 16-bit conversions Conversions to 16-bit need having aligment between the 16-bit and 32-bit types. So the conversion operations unpack 16-bit types to with an stride=2 and then applies a MOV with the conversion. v2 (Jason Ekstrand): - Avoid the general use of stride=2 for 16-bit register types. v3 (Topi Pohjolainen) - Code style fix (Jason Ekstrand) - Now nir_op_f2f16 was renamed to nir_op_f2f16_undef because conversion to f16 with undefined rounding is explicit Signed-off-by: Eduardo Lima <elima@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Kenneth Graunke	ff964916dc	i965: Use nir_lower_atomics_to_ssbos and delete ABO compiler code. We use the same hardware mechanism for both atomic counters and SSBO atomics, so there's really no benefit to maintaining separate code to handle each case. Instead, we can just use Rob's shiny new NIR pass to convert atomic_uints to SSBOs, and delete piles of code. The ssbo_start section of the binding table becomes a combined ABO and SSBO section, with ABOs first, then SSBOs. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-11-15 09:37:32 -08:00
Matt Turner	6ac2d16901	i965/fs: Fix extract_i8/u8 to a 64-bit destination The MOV instruction can extract bytes to words/double words, and words/double words to quadwords, but not byte to quadwords. For unsigned byte to quadword, we can read them as words and AND off the high byte and extract to quadword in one instruction. For signed bytes, we need to first sign extend to word and the sign extend that word to a quadword. Fixes the following test on CHV, BXT, and GLK: KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103628 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-11-14 10:56:18 -08:00

... 6 7 8 9 10

494 commits