fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-07 11:28:05 +02:00

Author	SHA1	Message	Date
Jose Maria Casanova Crespo	69be3a82ca	i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layout Restrict the use of untyped_surface_write with 16-bit pairs in ssbo to the cases where we can guarantee that offset is multiple of 4. Taking into account that VK_KHR_relaxed_block_layout is available in ANV we can only guarantee that when we have a constant offset that is multiple of 4. For non constant offsets we will always use byte_scattered_write. v2: (Jason Ekstrand) - Assert offset_reg to be multiple of 4 if it is immediate. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-28 21:37:40 -08:00
Jose Maria Casanova Crespo	8dd8be0323	i965/fs: Support 16-bit do_read_vector with VK_KHR_relaxed_block_layout 16-bit load_ubo/ssbo operations that call do_untyped_read_vector don't guarantee that offsets are multiple of 4-bytes as required by untyped_read message. This happens for example in the case of f16mat3x3 when then VK_KHR_relaxed_block_layout is enabled. Vectors reads when we have non-constant offsets are implemented with multiple byte_scattered_read messages that not require 32-bit aligned offsets. Now for all constant offsets we can use the untyped_read_surface message. In the case of constant offsets not aligned to 32-bits, we calculate a start offset 32-bit aligned and use the shuffle_32bit_load_result_to_16bit_data function and the first_component parameter to skip the copy of the unneeded component. v2: (Jason Ekstrand) Use untyped_read_surface messages always we have constant offsets. v3: (Jason Ekstrand) Simplify loop for reads with non constant offsets. Use end - start to calculate the number of 32-bit components to read with constant offsets. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-28 21:37:40 -08:00
Jose Maria Casanova Crespo	2dd94f462b	i965/fs: shuffle_32bit_load_result_to_16bit_data now skips components This helper used to load 16bit components from 32-bits read now allows skipping components with the new parameter first_component. The semantics now skip components until we reach the first_component, and then reads the number of components passed to the function. All previous uses of the helper are updated to use 0 as first_component. This will allow read 16-bit components when the first one is not aligned 32-bit. Enabling more usages of untyped_reads with 16-bit types. v2: (Jason Ektrand) Change parameters order to first_component, num_components Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-28 21:37:40 -08:00
Jose Maria Casanova Crespo	67d7dd594e	isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit The surfaces that backup the GPU buffers have a boundary check that considers that access to partial dwords are considered out-of-bounds. For example, buffers with 1,3 16-bit elements has size 2 or 6 and the last two bytes would always be read as 0 or its writting ignored. The introduction of 16-bit types implies that we need to align the size to 4-bytew multiples so that partial dwords could be read/written. Adding an inconditional +2 size to buffers not being multiple of 2 solves this issue for the general cases of UBO or SSBO. But, when unsized arrays of 16-bit elements are used it is not possible to know if the size was padded or not. To solve this issue the implementation calculates the needed size of the buffer surfaces, as suggested by Jason: surface_size = isl_align(buffer_size, 4) + (isl_align(buffer_size, 4) - buffer_size) So when we calculate backwards the buffer_size in the backend we update the resinfo return value with: buffer_size = (surface_size & ~3) - (surface_size & 3) It is also exposed this buffer requirements when robust buffer access is enabled so these buffer sizes recommend being multiple of 4. v2: (Jason Ekstrand) Move padding logic fron anv to isl_surface_state. Move calculus of original size from spirv to driver backend. v3: (Jason Ekstrand) Rename some variables and use a similar expresion when calculating. padding than when obtaining the original buffer size. Avoid use of unnecesary component call at brw_fs_nir. v4: (Jason Ekstrand) Complete comment with buffer size calculus explanation in brw_fs_nir. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-28 21:37:40 -08:00
Mathias Fröhlich	4c232dc721	vbo: Remove vbo_save_vertex_list::vertex_size. Like before use local variables from compile_vertex_list instead. Remove vertex_size from struct vbo_save_vertex_list. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-03-01 04:06:23 +01:00
Mathias Fröhlich	478a9bc7bb	vbo: Remove vbo_save_vertex_list::buffer_offset. The buffer_offset is used in aligned_vertex_buffer_offset. But now that most of these decisions are done in compile_vertex_list we can work on local variables instead of struct members in the display list code. Clean that up and remove buffer_offset. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-03-01 04:06:23 +01:00
Mathias Fröhlich	bfa8d8e5bf	vbo: Remove vbo_save_vertex_list::start_vertex. Replace last use on replay with _vbo_save_get_{min,max}_index. Appart from that it is not used anymore. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-03-01 04:06:23 +01:00
Mathias Fröhlich	6dd3e98c21	vbo: Remove vbo_save_vertex_list::attrsz. Is not used anymore on replay, move the last use in display list compilation to the original array in the display list compiler. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-03-01 04:06:23 +01:00
Mathias Fröhlich	95b4be4f29	vbo: Remove vbo_save_vertex_list::attrtype. Is not used anymore on replay, move the last use in display list compilation to the original array in the display list compiler. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-03-01 04:06:23 +01:00
Mathias Fröhlich	77df52cc4f	vbo: Remove vbo_save_vertex_list::enabled. Is not used anymore on replay. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-03-01 04:06:23 +01:00
Mathias Fröhlich	19a0f27a49	vbo: Remove reference to the vertex_store from the dlist node. Since we now store a set of VAOs in the display list, use these object to get the reference to the VBO in several places. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-03-01 04:06:23 +01:00
Mathias Fröhlich	6e410270ee	vbo: Implement current values update in terms of the VAO. Use the information already present in the VAO to update the current values after display list replay. Set GL_OUT_OF_MEMORY on allocation failure for the current value update storage. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-03-01 04:06:23 +01:00
Mathias Fröhlich	08aa0d9bf4	vbo: Implement vbo_loopback_vertex_list in terms of the VAO. Use the information already present in the VAO to replay a display list node using immediate mode draw commands. Use a hand full of helper methods that will be useful for the next patches also. v2: Insert asserts, constify local variables. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-03-01 04:06:23 +01:00
Mathias Fröhlich	f7178d677c	vbo: Use a local variable for the dlist offsets. The master value is now stored inside the VAO already present in struct vbo_save_vertex_list. Remove the unneeded copy from dlist storage. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-03-01 04:06:23 +01:00
Mathias Fröhlich	1cc3516a11	vbo: Remove unused vbo_save_context::wrap_count. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-03-01 04:06:23 +01:00
Mathias Fröhlich	07915020f0	vbo: Remove unused vbo_save_vertex_list::dangling_attr_ref. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-03-01 04:06:23 +01:00
Jason Ekstrand	6d3edbea16	anv: Always set has_context_priority We don't zalloc the physical device so we need to unconditionally set everything. Crucible helpfully initializes all allocations to 139 so it was getting true regardless of whether or not the kernel actually supports context priorities. Fixes: `6d8ab53303` "anv: implement VK_EXT_global_priority extension" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 17:31:20 -08:00
Mark Janes	0fc009b8c7	Revert "i965: Only emit 3DSTATE_DRAWING_RECTANGLE once on gen8+" This reverts commit `a2c1e48f15`. On BDWGT3e and KBLGT3e systems, this commit regressed the following tests: piglit.spec.ext_framebuffer_multisample.accuracy 2 stencil_resolve small depthstencil piglit.spec.ext_framebuffer_multisample.accuracy 4 stencil_resolve small depthstencil piglit.spec.ext_framebuffer_multisample.accuracy 6 stencil_resolve small depthstencil piglit.spec.ext_framebuffer_multisample.accuracy 8 stencil_resolve small depthstencil piglit.spec.ext_framebuffer_multisample.accuracy all_samples stencil_resolve small depthstencil	2018-02-28 17:26:08 -08:00
Dave Airlie	6c1b5a40fd	radeonsi/nir: increase values to 8 for gs fetch. This stops a crash when running (still fails): tests/spec/arb_gpu_shader_fp64/execution/explicit-location-gs-fs-vs.shader_test Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-01 10:35:09 +10:00
Bas Nieuwenhuizen	f9898b211e	radv: Use the syncobj wait ioctl to wait on fences if possible. Handles the !waitAll and signal after the start of the wait cases correctly. Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-03-01 01:07:18 +01:00
Bas Nieuwenhuizen	34bd5e2e2e	radv: Implement more efficient !waitAll fence waiting. Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-03-01 01:07:18 +01:00
Bas Nieuwenhuizen	6968d782d3	radv: Implement waiting on non-submitted fences. Fixes: `f4e499ec79` "radv: add initial non-conformant radv vulkan driver" Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-03-01 01:07:18 +01:00
Bas Nieuwenhuizen	2a404c6f92	radv: Implement WaitForFences with !waitAll. Nothing to do except using a busy wait loop. At least for old kernels. A better implementation for newer kernels to come later. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105255 Fixes: `f4e499ec79` "radv: add initial non-conformant radv vulkan driver" Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-03-01 01:07:18 +01:00
Dave Airlie	49879f3778	ac/nir: fix shared atomic operations. The nir->llvm conversion was using the wrong srcs. Fixes: tests/spec/arb_compute_shader/execution/shared-atomics.shader_test Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-01 10:06:06 +10:00
Dave Airlie	69495b30a3	ac/nir: don't apply slice rounding on txf_ms This matches the tgsi code. Fixes arb_texture_multisample texelFetch piglit tests. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Fixes: `f4e499ec79` (radv: add initial non-conformant radv vulkan driver) Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-01 10:04:34 +10:00
Timothy Arceri	f383fec903	radeonsi: set some context vars for nir path Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-03-01 10:51:56 +11:00
Timothy Arceri	7e46214f87	gallium: remove llvm from ir struct This was added in `425dc4c4b3` but never used. Also since `100796c15c` native has superseded llvm. Acked-by: Dave Airlie <airlied@redhat.com>	2018-03-01 10:51:56 +11:00
Kenneth Graunke	e51b0664e0	i965: Don't emit MOVs with undefined registers for Gen4 point clipping. Gen4 point clipping calls brw_clip_tri_alloc_regs with nr_verts == 0, which means that c->reg.vertex[] isn't initialized. It then emits MOVs to stomp components of those uninitialized registers to 0. This started causing assertions after Matt's recent series, when those uninitialized registers started getting BRW_REGISTER_TYPE_NF, which definitely doesn't exist on Gen4-5. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-02-28 15:03:51 -08:00
Eric Anholt	e4e79a02da	broadcom/vc5: Fix regression in the page-cache slice size alignment. We need to align the size of the slice, not the offset of the next slice. Fixes KHR-GLES3.texture_repeat_mode.rgba32ui_11x131_2_clamp_to_edge. Fixes: `b4b4ada761` ("broadcom/vc5: Fix layout of 3D textures.")	2018-02-28 13:59:50 -08:00
Jason Ekstrand	a2c1e48f15	i965: Only emit 3DSTATE_DRAWING_RECTANGLE once on gen8+ Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 13:31:42 -08:00
Jason Ekstrand	67da59e320	i965: Be more clever about setting up our viewport clip Before, we were trusting in the hardware to take the intersection of the viewport clip with the drawing rectangle. Unfortunately, 3DSTATE_DRAWING_RECTANGLE is fairly expensive because it implicitly does a full pipeline stall. If we're a bit more careful with our viewport clipping, we can just re-emit it once at context creation time. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 13:31:42 -08:00
Matt Turner	debaa822ef	intel/compiler: Re-add .vs_inputs_dual_locations = true Looks like a rebase mistake. Fixes: `89fe5190a2` ("intel/compiler: Lower flrp32 on Gen11+")	2018-02-28 13:25:21 -08:00
Dave Airlie	7cb9353de3	r600/shader: when using images always load thread id gpr at start (v2) The delayed loading code was fail if we had control flow. This fixes: tests/spec/arb_shader_image_load_store/execution/image_checkerboard.shader_test v2: don't use temp_reg before setting temp_reg up. Tested-by: Gert Wollny <gw.fossdev@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-02-28 20:16:19 +00:00
Dave Airlie	8369fdee8b	r600: fix whitespace in recent 1d texture commit. trivial fix.	2018-02-28 20:16:19 +00:00
Matt Turner	6f00bf519d	intel/compiler: Add ICL to test_eu_validate.cpp With the Align16 tests now disabled, we can run the rest of the tests in ICL mode (and see them pass!) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	ff4b41dd1d	intel/compiler: Disable Align16 tests on Gen11+ Align16 is no more. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	c31d77ac22	intel/compiler: Add instruction compaction support on Gen11 Gen11 only differs from SKL+ in that it uses a new datatype index table. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	d5bf093cf9	intel/compiler: Mark line, pln, and lrp as removed on Gen11+ Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	89fe5190a2	intel/compiler: Lower flrp32 on Gen11+ The LRP instruction is no more. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	2134ea3800	intel/compiler/fs: Implement ddy without using align16 for Gen11+ Align16 is no more. We previously generated an align16 ADD instruction to calculate DDY: add(16) g25<1>F -g23<4>.xyxyF g23<4>.zwzwF { align16 1H }; Without align16, we now implement it as: add(4) g25<1>F -g23<0,2,1>F g23.2<0,2,1>F { align1 1N }; add(4) g25.4<1>F -g23.4<0,2,1>F g23.6<0,2,1>F { align1 1N }; add(4) g26<1>F -g24<0,2,1>F g24.2<0,2,1>F { align1 1N }; add(4) g26.4<1>F -g24.4<0,2,1>F g24.6<0,2,1>F { align1 1N }; where only the first two instructions are needed in SIMD8 mode. Note: an earlier version of the patch implemented this in two instructions in SIMD16: add(8) g25<2>F -g23<4,2,0>F g23.2<4,2,0>F { align1 1N }; add(8) g25.1<2>F -g23.1<4,2,0>F g23.3<4,2,0>F { align1 1N }; but I realized that the channel enable bits will not be correct. If we knew we were under uniform control flow, we could emit only those two instructions however. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	62cfd4c656	intel/compiler/fs: Simplify ddx/ddy code generation The brw_reg() constructor just obfuscates things here, in my opinion. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	bed0267ff6	intel/compiler/fs: Pass fs_inst to generate_ddx/ddy instead of opcode In a future patch, generate_ddy will want to inspect inst->exec_size. Change generate_ddx as well for consistency. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	3a584a15c0	intel/compiler/fs: Don't generate integer DWord multiply on Gen11 Like CHV et al., Gen11 does not support 32x32 -> 32/64-bit integer multiplies. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	432674ce93	intel/compiler/fs: Implement FS_OPCODE_LINTERP with MADs on Gen11+ The PLN instruction is no more. Its functionality is now implemented using two MAD instructions with the new native-float type. Instead of pln(16) r20.0<1>:F r10.4<0;1,0>:F r4.0<8;8,1>:F we now have mad(8) acc0<1>:NF r10.7<0;1,0>:F r4.0<8;8,1>:F r10.4<0;1,0>:F mad(8) r20.0<1>:F acc0<8;8,1>:NF r5.0<8;8,1>:F r10.5<0;1,0>:F mad(8) acc0<1>:NF r10.7<0;1,0>:F r6.0<8;8,1>:F r10.4<0;1,0>:F mad(8) r21.0<1>:F acc0<8;8,1>:NF r7.0<8;8,1>:F r10.5<0;1,0>:F ... and in the case of SIMD8 only the first pair of MAD instructions is used. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	b5d8781e19	intel/compiler/fs: Return multiple_instructions_emitted from generate_linterp If multiple instructions are emitted, special handling of things like conditional mod and NoDDClr/NoDDChk need to be performed. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	b1afdf9fc1	intel/compiler/fs: Fix application of cmod and saturate to LINE/MAC pair This isn't technically broken, but the next patch will make this function report whether it generated multiple instructions, and that information will be used to disable the application of conditional mod by the generic code. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	2cff324210	intel/compiler: Add Gen11+ native float type This new type exposes the additional precision offered by the accumulator register and will be used in the next patch to implement the functionality of the PLN instruction using a pair of MAD instructions. One weird thing to note: align1 ternary instructions may only have an accumulator in the dst or src1 normally, but when src0's type is :NF the accumulator is read. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	58611ff913	intel/compiler: Add Gen11 register types The hardware register types' encodings have changed on Gen11. Good thing we have that superfluous looking brw_reg_type abstraction lying around! Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	bb428454a9	intel: Disable 64-bit extensions on platforms without 64-bit types Gen11 does not support DF, Q, UQ types in hardware. As a result, we have to disable some GL extensions until they can be reimplemented. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-02-28 11:15:47 -08:00
Anuj Phogat	5e42103f3b	intel: Add icl pci id for INTEL_DEVID_OVERRIDE Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>	2018-02-28 11:15:47 -08:00

1 2 3 4 5 ...

100533 commits