fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-21 00:18:09 +02:00

Author	SHA1	Message	Date
Lionel Landwerlin	41b54b5faf	i965: move OA accumulation code to intel/perf We'll want to reuse this in our Vulkan extension. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	f6bba7760f	i965: move mdapi data structure to intel/perf We'll want to reuse those structures later on. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	134e750e16	i965: extract performance query metrics We would like to reuse performance query metrics in other APIs. Let's make the query code dealing with the processing of raw counters into human readable values API agnostic. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	603ddda622	i965: store device revision in gen_device_info Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-17 14:10:42 +01:00
Topi Pohjolainen	ea42ba36b9	intel/compiler/icl: Use tcs barrier id bits 24:30 instead of 24:27 Similarly to `1cc17fb731` Fixes gpu hangs with dEQP-VK.tessellation.shader_input_output.barrier Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2019-04-17 14:55:49 +03:00
Jason Ekstrand	583a4d9a27	intel/mi_builder: Disable mem_mem tests on IVB Tested-by: Clayton Craft <clayton.a.craft@intel.com>	2019-04-16 12:59:12 -05:00
Jason Ekstrand	56d9532316	intel/mi_builder: Re-order an initializer The order doesn't matter in C99 but some C++ compilers seem to care. Tested-by: Clayton Craft <clayton.a.craft@intel.com>	2019-04-16 12:07:15 -05:00
Kenneth Graunke	fad7801afd	i965: Move program key debugging to the compiler. The i965 driver has a bunch of code to compare two sets of program keys and print out the differences. This can be useful for debugging why a shader needed to be recompiled on the fly due to non-orthogonal state dependencies. anv doesn't do recompiles, so we didn't need to share this in the past - but I'd like to use it in iris. This moves the bulk of the code to the compiler where it can be reused. To make that possible, we need to decouple it from i965 - we can't get at the brw program cache directly, nor use brw_context to print things. Instead, we use compiler->shader_perf_log(), and simply pass in keys. We put all of this debugging code in brw_debug_recompile.c, and only export a single function, for simplicity. I also tidied the code a bit while moving it, now that it all lives in one file. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-04-16 09:01:15 -07:00
Tapani Pälli	624789e370	compiler/glsl: handle case where we have multiple users for types Both Vulkan and OpenGL might be using glsl_types simultaneously or we can also have multiple concurrent Vulkan instances using glsl_types. Patch adds a one time init to track number of users and will release types only when last user calls _glsl_type_singleton_decref(). This change fixes glsl_type memory leaks we have with anv driver. v2: reuse hash_mutex, cleanup, apply fix also to radv driver and rename helper functions (Jason) v3: move init, destroy to happen on GL context init and destroy Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-16 12:58:00 +03:00
Danylo Piliaiev	04508f57d1	intel/compiler: Do not reswizzle dst if instruction writes to flag register If we write to the flag register changing the swizzle would change what channels are written to the flag register. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110201 Fixes: `4cd1a0be` Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: <ian.d.romanick@intel.com>	2019-04-16 09:42:08 +00:00
Dylan Baker	95aefc94a9	Delete autotools Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Matt Turner <mattst88@gmail.com>	2019-04-15 13:44:29 -07:00
Jason Ekstrand	90108deb27	anv: Update to use the new features struct names These were updated in version 1.1.106 of vulkan.h to make more sense with the extension names. We may as well keep with the times. Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-15 13:25:43 +00:00
Kenneth Graunke	8bf9b7b5b6	intel: Emit 3DSTATE_VF_STATISTICS dynamically Pipeline statistics queries should not count BLORP's rectangles. (23) How do operations like Clear, TexSubImage, etc. affect the results of the newly introduced queries? DISCUSSION: Implementations might require "helper" rendering commands be issued to implement certain operations like Clear, TexSubImage, etc. RESOLVED: They don't. Only application submitted rendering commands should have an effect on the results of the queries. Piglit's arb_pipeline_statistics_query-vert_adj exposes this bug when the driver is hacked to always perform glBufferData via a GPU staging copy (for debugging purposes). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-14 19:58:04 -07:00
Karol Herbst	14531d676b	nir: make nir_const_value scalar v2: remove & operator in a couple of memsets add some memsets v3: fixup lima Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)	2019-04-14 22:25:56 +02:00
Jason Ekstrand	9b1e4bab6b	nir/builder: Add a nir_imm_zero helper v2: replace nir_zero_vec with nir_imm_zero (Karol Herbst) Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-04-14 22:25:56 +02:00
Karol Herbst	daaf777376	nir/builder: Move nir_imm_vec2 from blorp into the builder While we're here, fix a typo which caused it to actually return a vec4 with the third and fourth components zero. Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-14 22:25:56 +02:00
Karol Herbst	bbf2ecaf35	intel/nir: use nir_src_is_const and nir_src_as_uint Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-14 22:25:56 +02:00
Jason Ekstrand	6b1c398bcb	intel/nir: Take a nir_tex_instr and src index in brw_texture_offset This makes things a bit simpler and it's also more robust because it no longer has a hard dependency on the offset being a 32-bit value.	2019-04-14 22:25:56 +02:00
Lionel Landwerlin	9e7b0988d6	anv: leave the top 4Gb of the high heap VMA unused In `628c9ca908` I forgot to apply the same -4Gb of the high address of the high heap VMA. This was previously computed in the HIGH_HEAP_MAX_ADDRESS. Many thanks to James for pointing this out. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reported-by: Xiong, James <james.xiong@intel.com> Fixes: `628c9ca908` ("anv: store heap address bounds when initializing physical device") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-13 12:08:23 +00:00
Sagar Ghuge	066d2aebc0	intel/fs: Remove unused condition from opt_algebraic case We will never hit a condition where we have src1 and src2 as immediate operands. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-12 13:47:57 -07:00
Jason Ekstrand	7eaaff18cb	anv/pipeline: Fix MEDIA_VFE_STATE::PerThreadScratchSpace on gen7 We were always programming it with the Broadwell convention which is too large by a factor of two on Haswell and just plain wrong on IVB and BYT. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable@lists.freedesktop.org	2019-04-12 16:08:35 +00:00
Karol Herbst	4a3c04a11f	glsl/nir: add support for lowering bindless images_derefs v2: handle atomics as well make use of nir_rewrite_image_intrinsic v3: remove call to nir_remove_dead_derefs v4: (Timothy Arceri) dont actually call lowering yet Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v3) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Timothy Arceri	035759b61b	nir/i965/freedreno/vc4: add a bindless bool to type size functions This required to calculate sizes correctly when we have bindless samplers/images. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Karol Herbst	3b2a9ffd60	nir: move brw_nir_rewrite_image_intrinsic into common code Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Lionel Landwerlin	628c9ca908	anv: store heap address bounds when initializing physical device We can then reuse those bounds to initialize the VMA heaps at logical device creation. This fixes an issue on EHL which has only 36bits of VMA. We were incorrectly using the fixed 48bits upper bound to initialize the logical device heap, resulting in addresses beyong the device's limits. v2: Don't confuse heap size (limited by system memory) and VMA size (limited by number of addressing bits the platform has) v3: Fix low heap vma_size :( (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reported-by: James Xiong <james.xiong@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)	2019-04-11 22:56:43 +01:00
Jason Ekstrand	316a98dec9	intel/common: Support bigger right-shifts with mi_builder Because why not?	2019-04-11 18:04:09 +00:00
Jason Ekstrand	0d6dea0ac8	anv/cmd_buffer: Use gen_mi_sub instead of gen_mi_add with a negative Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Jason Ekstrand	d17dd46b09	anv: Move mi_memcpy and mi_memset to gen_mi_builder Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Jason Ekstrand	bacb21fc6b	anv: Use gen_mi_builder for queries Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Jason Ekstrand	48da45891e	anv: Use gen_mi_builder for conditional rendering Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Jason Ekstrand	a3b0894afc	anv: Use gen_mi_builder for indirect dispatch Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Jason Ekstrand	b829dc30c1	anv: Use gen_mi_builder for indirect draw parameters Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Jason Ekstrand	0122a6f037	anv: Use gen_mi_builder for computing resolve predicates Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Jason Ekstrand	83b46ad6d8	anv: Use gen_mi_builder for CmdDrawIndirectByteCount Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Jason Ekstrand	8b8deeca78	intel/common: Add unit tests for gen_mi_builder Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Jason Ekstrand	2f7fcd103e	intel/common: Add a MI command builder Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Mark Janes	eda36feb2b	intel/tools: Remove redundant definitions of INTEL_DEBUG INTEL_DEBUG is declared extern and defined in gen_debug.c Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-10 13:15:33 -07:00
Mark Janes	2393cc7f00	intel/common: move gen_debug to intel/dev libintel_common depends on libintel_compiler, but it contains debug functionality that is needed by libintel_compiler. Break the circular dependency by moving gen_debug files to libintel_dev. Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-10 13:15:33 -07:00
Lionel Landwerlin	3053d5a4f2	anv: don't use default pipeline cache for hits for VK_EXT_pipeline_creation_feedback If the user didn't provide a pipeline cache and we're using the default internal pipeline cache, then we shouldn't consider a cache hit for VK_EXT_pipeline_creation_feedback as the application did not provide a cache. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `6601e5d6fc` ("anv: implement VK_EXT_pipeline_creation_feedback") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-10 18:45:04 +01:00
Lionel Landwerlin	ed009e68c5	genxml: sort xml files using new script Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-09 18:24:03 +01:00
Lionel Landwerlin	903e142f0d	genxml: add a sorting script Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-09 18:23:34 +01:00
Juan A. Suarez Romero	ec7a33af58	anv: advertise 8 subtexel/mipmap precision bits So far ANV was advertising 4 bits for both subTexelPrecisionBits and mipmapPrecisionBits. But these values were not actually verified. But it seems the right value is actually 8 bits for both cases. Unfortunately Intel PRM does not clarify how many bits the hardware use. For the mipmap case, there is the following reference in PRM Volume 6 (3D Media GPGPU), specifically in LOD Computation Pseudocode: ``` Bias: S4.8 MinLod: U4.8 MaxLod: U4.8 Base: U4.1 MIPCnt: U4 SurfMinLod: U4.8 ResMinLod: U4.8 `` We have other clues, though: - On one side, dEQP-VK.texture.explicit_lod.* tests fail when using 4 bits, but work when using 8 bits. These tests try to mimic the expected behaviour as much real as possible, and they use the reported subTexelPrecisionBits and mipmapPrecisionBits reported to get this. - On the other side, the equivalent driver for Windows is reporting 8 bits for both elements. Not sure if they got to verify it from the PRM or from a diffent source. CC: Jason Ekstrand <jason@jlekstrand.net> CC: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-09 15:28:42 +00:00
Caio Marcelo de Oliveira Filho	45a4129392	anv: Implement VK_NV_compute_shader_derivatives Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	94abc53030	intel/fs: Use NIR_PASS_V when lowering CS intrinsics This will make that step visible in NIR_PRINT=1. v2: Also use the macro for the cleanup passes. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	0425b34b79	intel/fs: Don't loop when lowering CS intrinsics This was needed when certain intrinsics were lowered to other ones that were defined by the same pass. After `060817b2` "intel,nir: Move gl_LocalInvocationID lowering to nir_lower_system_values" we don't need the loop anymore. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	3ee3024804	intel/fs: Add support for CS to group invocations in quads When using quads, instead of mapping the elements to the next 4 local invocation indices, we map the two next in the "current" row and two next in the "next row". A side effect is that a thread will execute the indices in a different order. We now perform the lowering of both local invocation ID and index together -- and don't rely anymore on lowering done by nir_lower_system_values. That is convenient when doing the math for quads, because we need X and Y to get the right invocation index. When the pass progresses, fold the constants and clean up to reduce the noise from the indexing math. This implements the derivative_group_quadsNV semantics from NV_compute_shader_derivatives. v2: Take subgroup_id into account, otherwise only values in the first subgroup would be used. (Jason) v3: Calculate invocation index and ID together, to avoid duplicating some math in the quads case when both index and ID are used. (Jason) v4: Don't call cleanup passes as part of the lowering, let that to the call site. (Jason) Change calculation to use less instructions. (Jason) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v3) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	ef0339d5ea	intel/fs: Use TEX_LOGICAL whenever implicit lod is supported Make sure we include compute shaders that have a derivative group defined. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Timothy Arceri	e30804c602	nir/radv: remove restrictions on opt_if_loop_last_continue() When I implemented opt_if_loop_last_continue() I had restricted this pass from moving other if-statements inside the branch opposite the continue. At the time it was causing a bunch of spilling in shader-db for i965. However Samuel Pitoiset noticed that making this pass more aggressive significantly improved the performance of Doom on RADV. Below are the statistics he gathered. 28717 shaders in 14931 tests Totals: SGPRS: 1267317 -> 1267549 (0.02 %) VGPRS: 896876 -> 895920 (-0.11 %) Spilled SGPRs: 24701 -> 26367 (6.74 %) Code Size: 48379452 -> 48507880 (0.27 %) bytes Max Waves: 241159 -> 241190 (0.01 %) Totals from affected shaders: SGPRS: 23584 -> 23816 (0.98 %) VGPRS: 25908 -> 24952 (-3.69 %) Spilled SGPRs: 503 -> 2169 (331.21 %) Code Size: 2471392 -> 2599820 (5.20 %) bytes Max Waves: 586 -> 617 (5.29 %) The codesize increases is related to Wolfenstein II it seems largely due to an increase in phis rather than the existing jumps. This gives +10% FPS with Doom on my Vega56. Rhys Perry also benchmarked Doom on his VEGA64: Before: 72.53 FPS After: 80.77 FPS v2: disable pass on non-AMD drivers Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1) Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-09 11:29:41 +10:00
Lionel Landwerlin	48e48b8560	intel: add dependency on genxml generated files Drivers using genxml will start compilation before generated files are created, so add a dependency to it. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Cc: mesa-stable@lists.freedesktop.org	2019-04-08 20:52:47 +00:00
Lionel Landwerlin	ce790c96a9	anv: implement VK_KHR_swapchain revision 70 This revision allows for images to be : - created by reusing image parameters from swapchain - bound to memory from a swapchain v2: Add color attachment flag Use same implicit WSI parameters (tiling, samples, usage) v3: Fix missing break in vk_foreach_struct_const() switch (Lionel) v4: Fix accessing image aspects before android resolve (Tapani) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-04-08 18:27:02 +01:00

1 2 3 4 5 ...

4060 commits