fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-05 09:38:07 +02:00

Author	SHA1	Message	Date
Iago Toral Quiroga	aaae24179f	intel/compiler: fix ddy for half-float in Broadwell Broadwell has restrictions that apply to Align16 half-float that make the Align16 implementation of this invalid for this platform. Use the gen11 path for this instead, which uses Align1 mode. The restriction is not present in cherryview, gen9 or gen10, where the Align16 implementation seems to work just fine. v2: - Rework the comment in the code, move the PRM citation from the commit message to the comment in the code (Matt) - Cherryview isn't affected, only Broadwell (Matt) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1) Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	60c7c6d3ba	intel/compiler: fix ddx and ddy for 16-bit float We were assuming 32-bit elements. Also, In SIMD8 we pack 2 vector components in a single SIMD register, so for example, component Y of a 16-bit vec2 starts is at byte offset 16B. This means that when we compute the offset of the elements to be differentiated we should not stomp whatever base offset we have, but instead add to it. v2 - Use byte_offset() helper (Jason) - Merge the fix for SIMD8: using byte_offset() fixes that too. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1) Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	8f40d392b9	intel/compiler: set correct precision fields for 3-source float instructions Source0 and Destination extract the floating-point precision automatically from the SrcType and DstType instruction fields respectively when they are set to types :F or :HF. For Source1 and Source2 operands, we use the new 1-bit fields Src1Type and Src2Type, where 0 means normal precision and 1 means half-precision. Since we always use the type of the destination for all operands when we emit 3-source instructions, we only need set Src1Type and Src2Type to 1 when we are emitting a half-precision instruction. v2: - Set the bit separately for each source based on its type so we can do mixed floating-point mode in the future (Topi). v3: - Use regular citation style for the comment referencing the PRM (Matt). - Decided not to add asserts in the emission code to check that only mixed HF/F types are used since such checks would break negative tests for brw_eu_validate.c (Matt) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	e6b7410187	intel/compiler: allow half-float on 3-source instructions since gen8 Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	ee049f6b71	intel/compiler: don't compact 3-src instructions with Src1Type or Src2Type bits We are now using these bits, so don't assert that they are not set. In gen8, if these bits are set compaction is not possible. On gen9 and CHV platforms set_3src_control_index() checks these bits (and others) against a table to validate if the particular bit combination is eligible for compaction or not. v2 - Add more detail in the commit message explaining the situation for SKL+ and CHV (Jason) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	120c970619	intel/compiler: add new half-float register type for 3-src instructions This is available since gen8. v2: restore previously existing assertion. v3: don't use separate tables for gen7 and gen8, just assert that we don't use half-float before gen8 (Matt) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	4ab2b97a8f	intel/compiler: add instruction setters for Src1Type and Src2Type. The original SrcType is a 3-bit field that takes a subset of the types supported for the hardware for 3-source instructions. Since gen8, when the half-float type was added, 3-source floating point operations can use use mixed precision mode, where not all the operands have the same floating-point precision. While the precision for the first operand is taken from the type in SrcType, the bits in Src1Type (bit 36) and Src2Type (bit 35) define the precision for the other operands (0: normal precision, 1: half precision). Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	a8d8b1a139	intel/compiler: drop unnecessary temporary from 32-bit fsign implementation Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	19cd2f5deb	intel/compiler: implement 16-bit fsign v2: - make 16-bit be its own separate case (Jason) v3: - Drop the result_int temporary (Jason) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	4588f4a604	intel/compiler: handle extended math restrictions for half-float Extended math with half-float operands is only supported since gen9, but it is limited to SIMD8. In gen8 we lower it to 32-bit. v2: quashed together the following patches (Jason): - intel/compiler: allow extended math functions with HF operands - intel/compiler: lower 16-bit extended math to 32-bit prior to gen9 - intel/compiler: extended Math is limited to SIMD8 on half-float Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (allow extended math functions with HF operands, extended Math is limited to SIMD8 on half-float)	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	114f4e6c29	intel/compiler: lower some 16-bit float operations to 32-bit The hardware doesn't support half-float for these. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	b6a454791b	intel/compiler: assert restrictions on conversions to half-float There are some hardware restrictions that brw_nir_lower_conversions should have taken care of before we get here. v2: - rebased on top of regioning lowering pass Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	66806405af	intel/compiler: handle b2i/b2f with other integer conversion opcodes Since we handle booleans as integers this makes more sense. v2: - rebased to incorporate new boolean conversion opcodes v3: - rebased on top regioning lowering pass Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v2)	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	92f4761198	intel/compiler: split float to 64-bit opcodes from int to 64-bit Going forward having these split is a bit more convenient since these two groups have different restrictions. v2: - Rebased on top of new regioning lowering pass. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	3e377c68f8	intel/compiler: add a NIR pass to lower conversions Some conversions are not directly supported in hardware and need to be split in two conversion instructions going through an intermediary type. Doing this at the NIR level simplifies a bit the complexity in the backend. v2: - Consider fp16 rounding conversion opcodes - Properly handle swizzles on conversion sources. v3 - Run the pass earlier, right after nir_opt_algebraic_late (Jason) - NIR alu output types already have the bit-size (Jason) - Use 'is_conversion' to identify conversion operations (Jason) v4: - Be careful about the intermediate types we use so we don't lose range and avoid incorrect rounding semantics (Jason) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Dominik Drees	829f278ad0	Add no_aos_sampling GALLIVM_PERF option This forces using general sampling and should improve precision and performance in some cases.	2019-04-17 22:16:19 +00:00
Samuel Pitoiset	ad6dc13fc7	ac: use struct/raw store intrinsics for 8-bit/16-bit int with LLVM 9+ This changes requires LLVM r356465. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-17 22:10:30 +02:00
Samuel Pitoiset	26ea506235	ac: use struct/raw load intrinsics for 8-bit/16-bit int with LLVM 9+ This changes requires LLVM r356465. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-17 22:10:28 +02:00
Samuel Pitoiset	6fd5e39b60	ac: add support for more types with struct/raw LLVM intrinsics LLVM 9+ now supports 8-bit and 16-bit types. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-17 22:10:25 +02:00
Samuel Pitoiset	9cf55b022d	radv: add VK_KHR_shader_atomic_int64 but disable it for now No support for 64-bit compare&swap atomic operations. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-17 21:59:56 +02:00
Samuel Pitoiset	d118e382dd	ac/nir: add 64-bit SSBO atomic operations support Except compare&swap which is still buggy. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-17 21:59:54 +02:00
Samuel Pitoiset	78c551aca1	ac/nir: use new LLVM 8 intrinsics for SSBO atomics except cmpswap Use the raw version (ie. IDXEN=0) because vindex is unused. Use the old intrinsic for compare&swap because the new one hangs the GPU for some reasons. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-17 21:59:52 +02:00
Roland Scheidegger	dded2edf8b	gallivm: fix saturated signed add / sub with llvm 9 llvm 8 removed saturated unsigned add / sub x86 sse2 intrinsics, and now llvm 9 removed the signed versions as well - they were proposed for removal earlier, but the pattern to recognize those was very complex, so it wasn't done then. However, instead of these arch-specific intrinsics, there's now arch-independent intrinsics for saturated add / sub, both for signed and unsigned, so use these. They should have only advantages (work with arbitrary vector sizes, optimal code for all archs), although I don't know how well they work in practice for other archs (at least for x86 they do the right thing). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110454 Reviewed-by: Brian Paul <brianp@vmware.com>	2019-04-17 17:42:13 +02:00
Juan A. Suarez Romero	b74e605cf4	meson: Add dependency on genxml to anvil genfiles This fixes a race condition where anv_gen_files are executed before genxml files, which causes a build failure v2: add dependency on idep_genxml (Lionel) Fixes: `d1992255bb` ("meson: Add build Intel "anv" vulkan driver") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-17 15:49:55 +02:00
Lionel Landwerlin	baf59e40cd	intel/perf: constify accumlator parameter Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	93dbe52ab0	intel/perf: drop counter size field We can deduct the size from another field, let's just save some space. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	a646485c28	i965: perf: add mdapi pipeline statistics queries on gen10/11 The Gen10+ expected format adds an additional counter which we can't disclose yet. We can still make the size of the expected query result match. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	d855906366	intel/perf: stub gen10/11 missing definitions Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	d47cc4acbf	i965: move mdapi guid into intel/perf One more thing we want to share between the different APIs. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	b48d6d7471	i965: move mdapi result data format to intel/perf We want to reuse this in Anv. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	2be07fc751	i965: move brw_timebase_scale to device info Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	41b54b5faf	i965: move OA accumulation code to intel/perf We'll want to reuse this in our Vulkan extension. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	f6bba7760f	i965: move mdapi data structure to intel/perf We'll want to reuse those structures later on. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	134e750e16	i965: extract performance query metrics We would like to reuse performance query metrics in other APIs. Let's make the query code dealing with the processing of raw counters into human readable values API agnostic. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	603ddda622	i965: store device revision in gen_device_info Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-17 14:10:42 +01:00
Topi Pohjolainen	ea42ba36b9	intel/compiler/icl: Use tcs barrier id bits 24:30 instead of 24:27 Similarly to `1cc17fb731` Fixes gpu hangs with dEQP-VK.tessellation.shader_input_output.barrier Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2019-04-17 14:55:49 +03:00
Erik Faye-Lund	ce1761edab	virgl: document potentially failing blit This blit can fail, but this is not new; in the old version we didn't even try to blit in this case. So let's just document the limitation for now, and leave this for another day. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	3fdacf1c39	virgl: do color-conversion during when mapping transfer When running on OpenGL ES, we can't just map any format for reading, because of limitations on glReadPixels. So let's fall back to the blit code-path, and translate the pixels to the correct format in the end. This fixes the remaining failures of KHR-GL32.packed_pixels.* apart from the sRGB tests. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	9e9d9b352e	virgl: only blit if resource is read Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	fba03322a2	virgl: get readback-formats from host Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	749bbd39c7	gallium/util: support translating between uint and sint formats Without this, we can't for instance convert between r8_sint and r8g8b8a8_sint. But that's pretty useful, so let's support it as well. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	f31b65f1c1	virgl: make sure bind is set for non-buffers Otherwise, virglrenderer will reject the resource. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	afbd68378a	virgl: support write-back with staged transfers We currently don't support writing to resources that uses a temporary staging-resource to resolve the pixels. If a write-bit was set, we forgot to perform a blit back to the old resource, followed by trying to update the wrong resource, which lacks backing-storage. The end-result would be that nothing useful happened. This approach also fixes a few smaller bugs, like using the wrong box (without x y and z zeroed out), which means a partial update of a multisampled texture could result in the wrong part of the texture being updated. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	0bc8683ffa	virgl: use pipe_box for blit dst-rect Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	121e366632	virgl: rewrite core of virgl_texture_transfer_map Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	1f27bd3f2b	virgl: return error if allocating resolve_tmp fails Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	fc8b1ca33a	virgl: wait for the right resource In case we're resolving, we need to wait for the resolved resource instead of the original one. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	6263304b2d	virgl: check for readback on correct resource Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	ac932ff822	virgl: make unmap queuing a bit more straight-forward It's hard to read the code that decides if we want to queue up an unmap or destroy the transfer right away. So let's make it a bit simpler, by setting a bool in case we want to queue it. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	b08e73308e	virgl: simplify virgl_texture_transfer_unmap logic There's no reason to keep an extra indentation level here, let's merge the two if-conditions. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00

1 2 3 4 5 ...

110075 commits