fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-25 03:58:19 +02:00

Author	SHA1	Message	Date
Chad Versace	85ca563b58	anv: Drop 'x11' prefix from non-X11 WSI funcs Drop it from x11_anv_wsi_image_create and x11_anv_wsi_image_free. The functions are used by Wayland WSI too. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2017-04-28 08:54:45 -07:00
Jason Ekstrand	ebd1bd6998	anv: Alphabetize KHR extensions Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2017-04-28 07:41:03 -07:00
Jason Ekstrand	032861693e	anv: Move queues, events, and semaphores to their own file Things are about to get more complicated, especially as far as semaphores are concerned. Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-04-27 20:08:46 -07:00
Jason Ekstrand	9bd1f03487	anv: Implement VK_KHX_external_memory_fd This commit just exposes the memory handle type. There's interesting we need to do here for images. So long as the user doesn't set any crazy environment variables such as INTEL_DEBUG=nohiz, all of the compression formats etc. should "just work" at least for opaque handle types. v2 (chadv): - Rebase. - Fix vkGetPhysicalDeviceImageFormatProperties2KHR when handleType == 0. - Move handleType-independency comments out of handleType-switch, in vkGetPhysicalDeviceExternalBufferPropertiesKHX. Reduces diff in future dma_buf patches. Co-authored-with: Chad Versace <chadversary@chromium.org> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-04-27 20:08:46 -07:00
Jason Ekstrand	818b857914	anv: Use the BO cache for DeviceMemory allocations Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-04-27 20:08:46 -07:00
Jason Ekstrand	494d6f65a7	anv/allocator: Add a BO cache This cache allows us to easily ensure that we have a unique anv_bo for each gem handle. We'll need this in order to support multiple-import of memory objects and semaphores. v2 (Jason Ekstrand): - Reject BO imports if the size doesn't match the prime fd size as reported by lseek(). Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-04-27 20:08:46 -07:00
Jason Ekstrand	5d25ac6a4b	anv: Implement VK_KHX_external_memory This is the trivial implementation that just exposes the extension string but exposes zero external handle types. Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-04-27 20:08:46 -07:00
Chad Versace	354ca7a1d4	anv: Implement VK_KHX_external_memory_capabilities This is a complete but trivial implementation. It's trivial becasue We support no external memory capabilities yet. Most of the real work in this commit is in reworking the UUIDs advertised by the driver. v2 (chadv): - Fix chain traversal in vkGetPhysicalDeviceImageFormatProperties2KHR. Extract VkPhysicalDeviceExternalImageFormatInfoKHX from the chain of input structs, not the chain of output structs. - In vkGetPhysicalDeviceImageFormatProperties2KHR, iterate over the input chain and the output chain separately. Reduces diff in future dma_buf patches. Co-authored-with: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Chad Versace <chadversary@chromium.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-04-27 20:08:46 -07:00
Jason Ekstrand	d4d9258b61	anv/physical_device: Rename uuid to pipeline_cache_uuid We're about to have more UUIDs for different things so this one really needs to be properly labeled. Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-04-27 20:08:46 -07:00
Jason Ekstrand	02767cb4ff	anv: Refactor device_get_cache_uuid into physical_device_init_uuids Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-04-27 20:08:46 -07:00
Jason Ekstrand	35e626bd0e	anv: Set EXEC_OBJECT_ASYNC when available Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-04-27 20:08:46 -07:00
Jason Ekstrand	bd3a9813b9	anv/cmd_buffer: Use the device allocator for QueueSubmit The command is really operating on a Queue not a command buffer and the nearest object to that with an allocator is VkDevice. Reviewed-by: Chad Versace <chadversary@chromium.org> Cc: "17.0 17.1" <mesa-dev@lists.freedesktop.org>	2017-04-27 20:08:46 -07:00
Jason Ekstrand	c43b4bc85e	anv: Don't place scratch buffers above the 32-bit boundary This fixes rendering corruptions in DOOM. Hopefully, it will also make Jenkins a bit more stable as we've been seeing some random failures and GPU hangs ever since turning on 48bit. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100620 Fixes: `651ec926fc` "anv: Add support for 48-bit addresses" Tested-by: Grazvydas Ignotas <notasas@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "17.1" <mesa-stable@lists.freedesktop.org>	2017-04-27 02:04:57 -07:00
Rafael Antognolli	6a40ccec4b	genxml: Fix gen_pack_header.py crash when field type is invalid. Just return earlier in that case. Also set prefix to an empty string, so we don't get to use it undefined. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-04-24 15:14:12 -07:00
Rafael Antognolli	9670124e31	genxml: Make BLEND_STATE command support variable length array. We need to emit BLEND_STATE, which size is 1 + 2 * nr_draw_buffers dwords (on gen8+), but the BLEND_STATE struct length is always 17. By marking it size 1, which is actually the size of the struct minus the BLEND_STATE_ENTRY's, we can emit a BLEND_STATE of variable number of entries. For gen6 and gen7 we set length to 0, since it only contains BLEND_STATE_ENTRY's, and no other data. With this change, we also change the code for blorp and anv to emit only the needed BLEND_STATE_ENTRY's, instead of always emitting 16 dwords on gen6-7 and 17 dwords on gen8+. v2: - Use designated initializers on blorp and remove 0 from initialization (Jason) - Default entries to disabled on Vulkan (Jason) - Rebase code. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-04-24 15:14:10 -07:00
Rafael Antognolli	4ace73b1f6	genxml: Fix python crash when no dwords are found. If the 'dwords' dict is empty, max(dwords.keys()) throws an exception. This case could happen when we have an instruction that is only an array of other structs, with variable length. v2: - Add another clause for empty dwords and make it work with python 3 (Dylan) - Set the length to 0 if dwords is empty, and do not declare dw Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-04-24 15:14:08 -07:00
Rafael Antognolli	19720405d5	genxml: Remove unused parameter. 'start' parameter from Group.emit_pack_function() is useless. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-04-24 15:14:05 -07:00
Rafael Antognolli	1ea41163eb	intel/aubinator: Correctly read variable length structs. Before this commit, when a group with count="0" is found, only one field is added to the struct representing the instruction. This causes only one entry to be printed by aubinator, for variable length groups. With this commit we "detect" that there's a variable length group (count="0") and store the offset of the last entry added to the struct when reading the xml. When finally reading the aubdump file, we check the size of the group and whether we have variable number of elements, and in that case, reuse the last field to add the remaining elements. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Tested-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2017-04-24 15:13:51 -07:00
Nanley Chery	50134cede1	isl/format: Update the R16G16B16X16_FLOAT entry The section of the PRM mentioned in the code comment above this table says that this format supports the render target write message. Internal documentation says that this format also supports alpha blending. As a side effect, this allows CCS_D buffers to be created for images with this format. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>	2017-04-24 13:30:50 -07:00
Nanley Chery	b1066f7365	anv/pass: Delete anv_pass::subpass_attachments This field has no users. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>	2017-04-24 13:30:50 -07:00
Francisco Jerez	58324389be	intel/fs: Take into account amount of data read in spilling cost heuristic. Until now the spilling cost calculation was neglecting the amount of data read from the register during the spilling cost calculation. This caused it to make suboptimal decisions in some cases leading to higher memory bandwidth usage than necessary. Improves Unigine Heaven performance by ~4% on BDW, reversing an unintended FPS regression from my previous commit `147e71242c` with n=12 and statistical significance 5%. In addition SynMark2 OglCSDof performance is improved by an additional ~5% on SKL, and a Kerbal Space Program apitrace around the Moho planet I can provide on request improves by ~20%. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Plamena Manolova <plamena.manolova@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-04-24 11:01:40 -07:00
Francisco Jerez	ecc19e12dc	intel/fs: Use regs_written() in spilling cost heuristic for improved accuracy. This is what we use later on to compute the number of registers that will actually get spilled to memory, so it's more likely to match reality than the current open-coded approximation. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Plamena Manolova <plamena.manolova@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-04-24 10:59:56 -07:00
Kenneth Graunke	6b10c37b9c	i965/vec4: Use reads_accumulator_implicitly(), not MACH checks. Curro pointed out that I should not just check for MACH, but use the reads_accumulator_implicitly() helper, which would also prevent the same bug with MAC and SADA2 (if we ever decide to use them). Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-24 10:53:49 -07:00
Timothy Arceri	7a7ee40c2d	nir/i965: add before ffma algebraic opts This shuffles constants down in the reverse of what the previous patch does and applies some simpilifications that may be made possible from doing so. Shader-db results BDW: total instructions in shared programs: 12980814 -> 12977822 (-0.02%) instructions in affected programs: 281889 -> 278897 (-1.06%) helped: 1231 HURT: 128 total cycles in shared programs: 246562852 -> 246567288 (0.00%) cycles in affected programs: 11271524 -> 11275960 (0.04%) helped: 1630 HURT: 1378 V2: mark float opts as inexact Reviewed-by: Elie Tournier <elie.tournier@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-04-24 12:08:14 +10:00
Kenneth Graunke	2faf227ec2	i965/vec4: Avoid reswizzling MACH instructions in opt_register_coalesce(). opt_register_coalesce() was optimizing sequences such as: mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D mach(8) vgrf5.xy:D, attr18.xyyy:D, attr19.xyyy:D mov(8) m4.zw:F, vgrf5.xxxy:F into: mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D mach(8) m4.zw:D, attr18.xxxy:D, attr19.xxxy:D This doesn't work - if we're going to reswizzle MACH, we'd need to reswizzle the MUL as well. Here, the MUL fills the accumulator's .zw components with attr18.yy * attr19.yy. But the MACH instruction expects .z to contain attr18.x * attr19.x. Bogus results ensue. No change in shader-db on Haswell. Prevents regressions in Timothy's patches to use enhanced layouts for varying packing (which rearrange code just enough to trigger this pre-existing bug, but were fine themselves). Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-04-22 00:01:16 -07:00
Jason Ekstrand	1e21d4227e	anv/query: Use genxml for MI_MATH Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed by: Iago Toral Quiroga <itoral@igalia.com>	2017-04-20 15:24:06 -07:00
Jason Ekstrand	e23129ac0c	genxml: Add better support for MI_MATH This breaks the guts of MI_MATH (the instruction part) out into its own structure with proper named values. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed by: Iago Toral Quiroga <itoral@igalia.com>	2017-04-20 15:24:06 -07:00
Jason Ekstrand	b7a2af8e38	genxml/pack: Allow hex values in the XML Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2017-04-20 15:24:06 -07:00
Nanley Chery	d9d793696b	anv/cmd_buffer: Disable CCS on BDW input attachments The description under RENDER_SURFACE_STATE::RedClearColor says, For Sampling Engine Multisampled Surfaces and Render Targets: Specifies the clear value for the red channel. For Other Surfaces: This field is ignored. This means that the sampler on BDW doesn't support CCS. Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>	2017-04-17 16:47:38 -07:00
Lionel Landwerlin	d71efbe5f2	anv: blorp: flush memory after copy Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>	2017-04-17 14:45:57 -07:00
Kenneth Graunke	9b71709cb8	intel/decoder: Fix is_header_field starting condition. Starting positions >= 32 are not part of the header, rather than >. Caught by Coverity, which found that "bits <<= field->start" may shift by 32, which has undefined behavior. CID: 1404968 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-04-16 22:58:23 -07:00
Jason Ekstrand	d2d6cf6c83	anv: Add the pci_id into the shader cache UUID This prevents a user from using a cache created on one hardware generation on a different one. Of course, with Intel hardware, this requires moving their drive from one machine to another but it's still possible and we should prevent it. Reviewed-by: Chad Versace <chadversary@chromium.org> Cc: mesa-stable@lists.freedesktop.org	2017-04-14 17:41:07 -07:00
Matt Turner	2eeb1b0ad9	i965: Use correct VertStride on align16 instructions. In commit `c35fa7a`, we changed the "width" of DF source registers to 2, which is conceptually fine. Unfortunately a VertStride of 2 is not allowed by align16 instructions on IVB/BYT, and the regular VertStride of 4 works fine in any case. See generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/vs-round-double.shader_test for example: cmp.ge.f0(8) g18<1>DF g1<0>.xyxyDF -g8<2>DF { align16 1Q }; ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed cmp.ge.f0(8) g19<1>DF g1<0>.xyxyDF -g9<2>DF { align16 2N }; ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed v2: - Add spec quote (Curro). - Change the condition to only BRW_VERTICAL_STRIDE_2 (Curro) Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez	d8441e2276	i965/vec4/dce: improve track of partial flag register writes This is required for correctness in presence of multiple 4-wide flag writes (e.g. 4-wide instructions with a conditional mod set) which update a different portion of the same 8-bit flag subregister. Right now we keep track of flag dataflow with 8-bit granularity and consider flag writes to have killed any previous definition of the same subregister even if the write was less than 8 channels wide, which can cause live flag register updates to be dead code-eliminated incorrectly. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez	c1fc8fad47	i965/vec4: don't do horizontal stride on some register file types horiz_offset() shouldn't be doing anything for scalar registers, because all channels of any SIMD instructions will end up reading or writing the same component of the register, so shifting the register offset would be wrong. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Re-implement in terms of is_uniform() for simplicity. Pass argument by const reference. Clarify commit message. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Matt Turner	21e8e3a848	i965/vec4: Fix exec size for MOVs {SET,PICK}_{HIGH,LOW}_32BIT. Otherwise for a pack_double_2x32_split opcode, we emit: vec1 64 ssa_135 = pack_double_2x32_split ssa_133, ssa_134 mov(8) g5<1>UD g5<4>.xUD { align16 1Q compacted }; mov(8) g7<2>UD g5<4,4,1>UD { align1 1Q }; ERROR: When the destination spans two registers, the source must span two registers (exceptions for scalar source and packed-word to packed-dword expansion) mov(8) g8<2>UD g5.4<4,4,1>UD { align1 2N }; ERROR: The offset from the two source registers must be the same mov(8) g5<1>UD g6<4>.xUD { align16 1Q compacted }; mov(8) g7.1<2>UD g5<4,4,1>UD { align1 1Q }; ERROR: When the destination spans two registers, the source must span two registers (exceptions for scalar source and packed-word to packed-dword expansion) mov(8) g8.1<2>UD g5.4<4,4,1>UD { align1 2N }; ERROR: The offset from the two source registers must be the same The intention was to emit mov(4)s for the instructions that have ERROR annotations. See tests/spec/arb_gpu_shader_fp64/execution/vs-isinf-dvec.shader_test for example. v2 (Samuel): - Instead of setting the exec size to a fixed value, don't double it (Curro). - Add PICK_{HIGH,LOW}_32BIT to the condition. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Trivial rebase changes. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez	f030aaf2fb	i965/vec4: use vec4_builder to emit instructions in setup_imm_df() Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Drop useless vec4_visitor dependencies. Demote to static stand-alone function. Don't write unused components in the result. Use vec4_builder interface for register allocation. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Juan A. Suarez Romero	a907c91e93	i965/vec4: consider subregister offset in live variables Take into account offset values less than a full register (32 bytes) when getting the var from register. This is required when dealing with an operation that writes half of the register (like one d2x in IVB/BYT, which uses exec_size == 4). v2: - Take in account this offset < 32 in liveness analysis too (Curro) v3: - Change formula in var_from_reg() (Curro) - Remove useless changes (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Francisco Jerez	92649a3e67	i965/vec4: fix assert to detect SIMD lowered DF instructions in IVB On IVB, DF instructions have lowered the SIMD width to 4 but the exec_size will be later doubled. Fix the assert to avoid crashing in this case. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Simplify assert. Except for the 'inst->group % 4 == 0' part the assertion was redundant with the previous assertion. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez	6e3265eae5	i965/vec4: split VEC4_OPCODE_FROM_DOUBLE into one opcode per destination's type This way we can set the destination type as double to all these new opcodes, avoiding any optimizer's confusion that was happening before. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Drop no_spill workaround originally needed due to the bogus destination type of VEC4_OPCODE_FROM_DOUBLE. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez	50a5217637	i965/vec4: split d2x conversion and data gathering from one opcode to two explicit ones When doing a 64-bit to a smaller data type size conversion, the destination should be aligned to 64-bits. Because of that, we need to gather the data after the actual conversion. Until now, these two operations were done by VEC4_OPCODE_FROM_DOUBLE but now we split them explicitely in two different instructions: VEC4_OPCODE_FROM_DOUBLE just do the conversion and VEC4_OPCODE_PICK_LOW_32BIT will gather the data. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero	cfaf14a126	i965/vec4: fix VEC4_OPCODE_FROM_DOUBLE for IVB/BYT In the generator we must generate slightly different code for Ivybridge/Baytrail, because of the way the stride works in this hardware. v2: - Use stride and don't need to fix dst (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero	be445d3ea3	i965/vec4: keep original type when dealing with null registers Keep the original type when dealing with null registers. Especially because we do no want to introduce an implicit conversion between types that could affect the conditional flags. This affects especially when the original type is DF, and we are working on Ivybridge/Baytrail. v2 (Curro) - Fix typo. - Use retype() instead of applying the type directly. - Remove unneeded retype. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez	a21dc2b500	i965/vec4: split DF instructions and later double its execsize in IVB/BYT We need to split DF instructions in two on IVB/BYT as it needs an execsize 8 to process 4 DF values (one GRF in total). v2: - Rename helper and make it static inline function (Matt). - Fix indention and add braces (Matt). v3: - Don't edit IR instruction when doubling exec_size (Curro) - Add comment into the code (Curro). - Manage ARF registers like the others (Curro) v4: - Add get_exec_type() function and use it to calculate the execution size. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Take destination type as execution type where there is no valid source. Assert-fail if the deduced execution type is byte. Clarify comment in get_lowered_simd_width(). Move SIMD width workaround outside of 'if (...inst->size_written > REG_SIZE)' conditional block, since the problem should be independent of whether the amount of data written by the instruction is greater or lower than a GRF. Drop redundant is_ivb_df definition. Drop bogus inst->exec_size < 8 check. Simplify channel group assertion. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez	a5399e8b1c	i965/fs: lower all non-force_writemask_all DF instructions to SIMD4 on IVB/BYT The hardware applies the same channel enable signals to both halves of the compressed instruction which will be just wrong under non-uniform control flow. Fix this by splitting those instructions to SIMD4. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Francisco Jerez	ebfb703d44	i965/fs: Get 64-bit indirect moves working on IVB. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-04-14 14:56:08 -07:00
Matt Turner	630b84cdc8	i965: Use source region <1,2,0> when converting to DF. Doing so allows us to use a single MOV in VEC4_OPCODE_TO_DOUBLE instead of two. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero	3198ce3f96	i965/fs: fix lower SIMD width for IVB/BYT's MOV_INDIRECT According to the IVB and HSW PRMs: "2.When the destination requires two registers and the sources are indirect, the sources must use 1x1 regioning mode." So for DF instructions the execution size is not limited by the number of address registers that are available, but by the EU decompression logic not handling VxH indirect addressing correctly. This patch limits the SIMD width to 4 in this case. v2: - Fix typo (Matt). - Fix condition (Curro) v3: - Add spec quote (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero	571cbd05eb	i965/fs: fix dst stride in IVB/BYT type conversions When converting a DF to 32-bit conversions, we set dst stride to 2, to fulfill alignment restrictions because the upper Dword of every Qword will be written with undefined value. But in IVB/BYT, this is not necessary, as each DF conversion already writes 2, the first one the real value, and the second one a 0. That is, IVB/BYT already set stride = 2 implicitly, so we must set it to 1 explicitly to avoid ending up with stride = 4. v2: - Fix typo (Matt) v3: - Fix stride in the destination's brw_reg, don't modity IR (Curro) v4: - Remove 'is_dst' argument of brw_reg_from_fs_reg() (Curro) - Fix comment (Curro). - Relax hstride assert (Curro) Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Minor spelling fixes. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez	af6fc3a8ea	i965/fs: rename lower_d2x to lower_conversions v2: - Change the name to lower_conversions. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00

1 2 3 4 5 ...

1623 commits