fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-02 03:38:06 +02:00

Author	SHA1	Message	Date
Lionel Landwerlin	ddacd3d43b	intel/perf: fix improper pointer access This expression was unused by the macro, probably why it didn't register in the compilation. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Mark Janes <mark.a.janes@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-12-04 09:21:15 +00:00
Lionel Landwerlin	8c0b058263	intel/perf: simplify the processing of OA reports This is a more accurate description of what happens in processing the OA reports. Previously we only had a somewhat difficult to parse state machine tracking the context ID. What we really only need to do to decide if the delta between 2 reports (r0 & r1) should be accumulated in the query result is : * whether the r0 is tagged with the context ID relevant to us * if r0 is not tagged with our context ID and r1 is: does r0 have a invalid context id? If not then we're in a case where i915 has resubmitted the same context for execution through the execlist submission port v2: Update comment (Ken) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-12-04 09:21:15 +00:00
Lionel Landwerlin	b364e920bf	intel/perf: take into account that reports read can be fairly old If we read the OA reports late enough after the query happens, we can get a timestamp in the report that is significantly in the past compared to the start timestamp of the query. The current code must deal with the wraparound of the timestamp value (every ~6 minute). So consider that if the difference is greater than half that wraparound period, we're probably dealing with an old report and make the caller aware it should read more reports when they're available. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Mark Janes <mark.a.janes@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-12-04 09:21:15 +00:00
Lionel Landwerlin	9d0a5c817c	intel/perf: set read buffer len to 0 to identify empty buffer We always add an empty buffer in the list when creating the query. Let's set the len appropriately so that we can recognize it when we read OA reports up to the end of a query. We were using an 0 timestamp value associated with the empty buffer and incorrectly assuming this was a valid value. In turn that led to not reading enough reports and resulted in deltas added to our counter values which should have been discarded because those would be flagged for a different context. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Mark Janes <mark.a.janes@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-12-04 09:21:15 +00:00
Lionel Landwerlin	acea59dbf8	intel/perf: fix invalid hw_id in query results Accumulation happens between 2 reports, it can be between a start/end report from another context. So only consider updating the hw_id of the results when it's not already valid and that we have a valid value to put in there. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `41b54b5faf` ("i965: move OA accumulation code to intel/perf") Reviewed-by: Mark Janes <mark.a.janes@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-12-04 09:21:15 +00:00
Jason Ekstrand	178a2946c0	anv: Respect the always_flush_cache driconf option Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-03 17:10:51 -06:00
Jason Ekstrand	b1f37688ba	anv: Set up SBE_SWIZ properly for gl_Viewport gl_Viewport is also in the VUE header so we need to whack the read offset to 0 and emit a default (no overrides) SBE_SWIZ entry in that case as well. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-03 16:20:50 +00:00
Ian Romanick	d15344c0f5	intel/compiler: Increase nir_opt_peephole_select threshold I tried 2, 4, 6, 8, and 10. 8 seemed to be the sweet spot across all Intel platforms. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Matt Turner <mattst88@gmail.com> All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14736141 -> 14661140 (-0.51%) instructions in affected programs: 2272413 -> 2197412 (-3.30%) helped: 8416 HURT: 140 helped stats (abs) min: 1 max: 1152 x̄: 8.99 x̃: 6 helped stats (rel) min: 0.13% max: 42.55% x̄: 4.15% x̃: 3.20% HURT stats (abs) min: 1 max: 140 x̄: 4.73 x̃: 1 HURT stats (rel) min: 0.03% max: 3.44% x̄: 0.87% x̃: 0.60% 95% mean confidence interval for instructions value: -9.36 -8.17 95% mean confidence interval for instructions %-change: -4.14% -3.99% Instructions are helped. total cycles in shared programs: 231560416 -> 228585416 (-1.28%) cycles in affected programs: 126536021 -> 123561021 (-2.35%) helped: 7092 HURT: 1898 helped stats (abs) min: 1 max: 419320 x̄: 519.02 x̃: 159 helped stats (rel) min: <.01% max: 77.25% x̄: 13.52% x̃: 11.77% HURT stats (abs) min: 1 max: 14518 x̄: 371.91 x̃: 36 HURT stats (rel) min: <.01% max: 103.23% x̄: 5.92% x̃: 2.55% 95% mean confidence interval for cycles value: -514.34 -147.50 95% mean confidence interval for cycles %-change: -9.69% -9.14% Cycles are helped. total spills in shared programs: 5763 -> 5848 (1.47%) spills in affected programs: 1797 -> 1882 (4.73%) helped: 13 HURT: 13 total fills in shared programs: 17163 -> 16931 (-1.35%) fills in affected programs: 7214 -> 6982 (-3.22%) helped: 22 HURT: 19 total sends in shared programs: 730410 -> 730246 (-0.02%) sends in affected programs: 2705 -> 2541 (-6.06%) helped: 114 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 1.44 x̃: 1 helped stats (rel) min: 0.60% max: 20.00% x̄: 7.26% x̃: 5.88% 95% mean confidence interval for sends value: -1.55 -1.33 95% mean confidence interval for sends %-change: -7.90% -6.62% Sends are helped. LOST: 4 GAINED: 0 Sandy Bridge total instructions in shared programs: 10760511 -> 10724637 (-0.33%) instructions in affected programs: 961305 -> 925431 (-3.73%) helped: 3734 HURT: 110 helped stats (abs) min: 1 max: 151 x̄: 9.66 x̃: 8 helped stats (rel) min: 0.14% max: 41.21% x̄: 4.93% x̃: 3.95% HURT stats (abs) min: 1 max: 20 x̄: 1.68 x̃: 1 HURT stats (rel) min: 0.12% max: 5.41% x̄: 0.88% x̃: 0.52% 95% mean confidence interval for instructions value: -9.76 -8.91 95% mean confidence interval for instructions %-change: -4.90% -4.63% Instructions are helped. total cycles in shared programs: 153359411 -> 152991077 (-0.24%) cycles in affected programs: 11615401 -> 11247067 (-3.17%) helped: 2725 HURT: 1138 helped stats (abs) min: 1 max: 2844 x̄: 164.27 x̃: 80 helped stats (rel) min: 0.02% max: 48.60% x̄: 7.47% x̃: 3.91% HURT stats (abs) min: 1 max: 4351 x̄: 69.69 x̃: 25 HURT stats (rel) min: 0.02% max: 40.00% x̄: 3.39% x̃: 1.47% 95% mean confidence interval for cycles value: -103.18 -87.52 95% mean confidence interval for cycles %-change: -4.57% -3.97% Cycles are helped. total sends in shared programs: 584038 -> 583855 (-0.03%) sends in affected programs: 3512 -> 3329 (-5.21%) helped: 157 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.17 x̃: 1 helped stats (rel) min: 2.38% max: 25.00% x̄: 6.52% x̃: 6.06% 95% mean confidence interval for sends value: -1.26 -1.07 95% mean confidence interval for sends %-change: -7.17% -5.87% Sends are helped. LOST: 23 GAINED: 0 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8122617 -> 8111592 (-0.14%) instructions in affected programs: 380503 -> 369478 (-2.90%) helped: 912 HURT: 86 helped stats (abs) min: 1 max: 129 x̄: 12.19 x̃: 9 helped stats (rel) min: 0.30% max: 39.21% x̄: 3.69% x̃: 2.57% HURT stats (abs) min: 1 max: 2 x̄: 1.05 x̃: 1 HURT stats (rel) min: 0.12% max: 3.64% x̄: 0.54% x̃: 0.36% 95% mean confidence interval for instructions value: -12.00 -10.10 95% mean confidence interval for instructions %-change: -3.56% -3.10% Instructions are helped. total cycles in shared programs: 188509780 -> 188534398 (0.01%) cycles in affected programs: 7211542 -> 7236160 (0.34%) helped: 859 HURT: 132 helped stats (abs) min: 2 max: 690 x̄: 46.59 x̃: 16 helped stats (rel) min: 0.01% max: 26.76% x̄: 1.53% x̃: 0.33% HURT stats (abs) min: 2 max: 1592 x̄: 489.67 x̃: 618 HURT stats (rel) min: 0.03% max: 185.92% x̄: 23.35% x̃: 6.26% 95% mean confidence interval for cycles value: 9.58 40.10 95% mean confidence interval for cycles %-change: 0.65% 2.93% Cycles are HURT.	2019-12-02 16:46:20 -08:00
Jason Ekstrand	a8965c076b	anv: Push constants are relative to dynamic state on IVB Fixes: `aecde2351` "anv: Pre-compute push ranges for graphics pipelines" Closes: #2136 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-26 22:15:54 +00:00
Jason Ekstrand	854859fefa	anv/entrypoints: Better handle promoted extensions In the case of promoted extensions we can end up with an entrypoint that we support being an alias of an entrypoint we do not support. For instance, if an extension gets promoted from EXT to KHR, the EXT entry- points may be aliases of the KHR ones. We want to leave everything as EXT until we get around to advertising the KHR so that we don't break things when we update the XML and headers. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-26 02:48:42 +00:00
Samuel Pitoiset	d6db858771	meson: only build imgui when needed Only required for Intel tools or the Vulkan overlay layer. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-11-25 07:51:56 +00:00
Ian Romanick	e51eda99df	intel/fs: Disable conditional discard optimization on Gen4 and Gen5 The CMP instruction on Gen4 and Gen5 generates one bit (the LSB) of valid data and 31 bits of junk. Results of comparisons that are used as Boolean values need to have a fixup applied to generate the proper 0/~0 values. Calling fs_visitor::nir_emit_alu with need_dest=false prevents the fixup code from being generated. This results in a sequence like: cmp.l.f0.0(16) g8<1>F g14<8,8,1>F 0x0F /* 0F / ... cmp.l.f0.0(16) g4<1>F g6<8,8,1>F 0x0F / 0F / (+f0.1) or.z.f0.1(16) null<1>UD g4<8,8,1>UD g8<8,8,1>UD instead of cmp.l.f0.0(16) g8<1>F g14<8,8,1>F 0x0F / 0F / ... cmp.l.f0.0(16) g4<1>F g6<8,8,1>F 0x0F / 0F */ or(16) g4<1>UD g4<8,8,1>UD g8<8,8,1>UD (+f0.1) and.z.f0.1(16) null<1>UD g4<8,8,1>UD 1UD I examined a couple of the shaders hurt by this change, and ALL of them would have been affected by this bug. :( Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1836 Fixes: `0ba9497e66` ("intel/fs: Improve discard_if code generation") Iron Lake total instructions in shared programs: 8122757 -> 8122957 (<.01%) instructions in affected programs: 8307 -> 8507 (2.41%) helped: 0 HURT: 100 HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.84% max: 6.67% x̄: 2.81% x̃: 2.76% 95% mean confidence interval for instructions value: 2.00 2.00 95% mean confidence interval for instructions %-change: 2.58% 3.03% Instructions are HURT. total cycles in shared programs: 188510100 -> 188510376 (<.01%) cycles in affected programs: 76018 -> 76294 (0.36%) helped: 0 HURT: 55 HURT stats (abs) min: 2 max: 12 x̄: 5.02 x̃: 4 HURT stats (rel) min: 0.07% max: 3.75% x̄: 0.86% x̃: 0.56% 95% mean confidence interval for cycles value: 4.33 5.71 95% mean confidence interval for cycles %-change: 0.60% 1.12% Cycles are HURT. GM45 total instructions in shared programs: 4994403 -> 4994503 (<.01%) instructions in affected programs: 4212 -> 4312 (2.37%) helped: 0 HURT: 50 HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.84% max: 6.25% x̄: 2.76% x̃: 2.72% 95% mean confidence interval for instructions value: 2.00 2.00 95% mean confidence interval for instructions %-change: 2.45% 3.07% Instructions are HURT. total cycles in shared programs: 128928750 -> 128928982 (<.01%) cycles in affected programs: 67442 -> 67674 (0.34%) helped: 0 HURT: 47 HURT stats (abs) min: 2 max: 12 x̄: 4.94 x̃: 4 HURT stats (rel) min: 0.09% max: 3.75% x̄: 0.75% x̃: 0.53% 95% mean confidence interval for cycles value: 4.19 5.68 95% mean confidence interval for cycles %-change: 0.50% 1.00% Cycles are HURT.	2019-11-21 16:40:50 -08:00
Jason Ekstrand	2fca325ea6	Revert "i965/fs: Merge CMP and SEL into CSEL on Gen8+" This reverts commit `52c7df1643`. The pass, while clearly useful for some shaders, has at least three bugs that I was able to find fairly quickly: 1. It doesn't work for type-converting MOVs because f > 0 is not the same as f2i(f) > 0 2. CSEL is a 3src instruction and only supports one source type; it doesn't take this into account and tries to create instructions which do a F compare and a D select. This is especially nasty to debug because you don't see that in the dumped assembly because we don't properly assert that types are the same in codegen. 3. While you can handle 2, in theory, by reinterpreting types, you can't do that in the presence of source modifiers. This pass doesn't even attempt to detect that. Those are just the ones I found with the one almost trival shader I was debugging. There very likely may be more and. Best thing to do for now is just shut it off until someone has the time to figure out how to do this properly and write tests to ensure it's correct. Fixes: 3cb085e6d61a "i965/fs: Merge CMP and SEL into CSEL on Gen8+" Reviewed-by: Brian Paul <brianp@vmware.com>	2019-11-20 20:47:32 +00:00
Marek Olšák	ebe7579655	nir: move data.image.access to data.access The size of the data structure doesn't change. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-19 18:20:05 -05:00
Eric Engestrom	51e214c1db	anv: add missing "fall-through" annotation CoverityID: 1455884 Fixes: `c1c346f166` ("anv: implement VK_KHR_separate_depth_stencil_layouts") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-11-19 22:03:00 +00:00
Rafael Antognolli	dadb6ebbd1	intel: Add workaround for stencil state. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-11-19 21:43:09 +00:00
Iván Briano	ca94717035	intel/compiler: Don't change hstride if not needed Alignment requirements may have changed the horizontal stride already, so don't set it if not required to avoid breaking said requirements. Fixes several tests such as dEQP-VK.subgroups.vote.graphics.subgroupallequal_int8_t Signed-off-by: Iván Briano <ivan.briano@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-11-18 14:19:41 -08:00
Jason Ekstrand	fdaf8144a8	anv: Emit a NULL vertex for zero base_vertex/instance If both are zero (the common case), we can emit a null vertex buffer rather than emitting a vertex buffer with zeros in it. The packing of the VERTEX_BUFFER_STATE is faster because no relocation is emitted and we can avoid creating the vertex buffer which means one less anv_state_stream_alloc. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	bc9d7836bc	anv: Use an anv_state for the next binding table This is a bit more natural because we're already getting an anv_state most places in the pipeline. The important part here, however, is that we're no longer calling anv_block_pool_map on every alloc_binding_table call. While it's probably pretty cheap, it is potentially a linear walk over the list of BOs and it was showing up in profiles. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	98dc179c1e	anv: More carefully dirty state in BindPipeline Instead of blindly dirtying descriptors and push constants the moment we see a pipeline change, check to see if it actually changes the bind layout or push constant layout. This doubles the runtime performance of one CPU-limited example running with the Dawn WebGPU implementation when running on my laptop. NOTE: This effectively reverts `beca63c6c0`. While it was a nice optimization, it was based on prog_data and we can't do that anymore once we start allowing the same binding table to be used with multiple different pipelines. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	22f16ff54a	anv: More carefully dirty state in BindDescriptorSets Instead of dirtying all graphics or all compute based on binding point, we're now much more careful. We first check to see if the actual descriptor set changed and then only dirty the stages used by that descriptor set. For dynamic offsets, we keep a bitfield per-stage of which offsets are actually used in that stage and we only dirty push constants and descriptors if that stage has dynamic offsets AND those offsets actually change. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	ca8117b5d5	anv: Use a switch statement for binding table setup It theoretically could be more efficient but the real point here is that it's no longer really a matter of dealing with special cases and then the "real" thing. The way we're handling binding tables, it's more of a multi-step process and a switch is more natural. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	9baa33cef0	anv: Rework push constant handling This substantially reworks both the state setup side of push constant handling and the pipeline compile side. The fundamental change here is that we're no longer respecting the prog_data::param array and instead are just instructing the back-end compiler to leave the array alone. This makes the state setup side substantially simpler because we can now just memcpy the whole block of push constants and don't have to upload one DWORD at a time. This also means that we can compute the full push constant layout up-front and just trust the back-end compiler to not mess with it. Maybe one day we'll decide that the back-end compiler can do useful things there again but for now, this is functionally no different from what we had before this commit and makes the NIR handling cleaner. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	ca91ab8015	anv: Re-arrange push constant data a bit This moves the compute stuff into a anv_push_constants::cs sub-struct. It also moves dynamic offsets into the push constants. This means we have to duplicate the data per-stage but that doesn't seem like the end of the world and one day we may wish to make dynamic offsets per-stage anyway. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	d1c4e64a69	intel/compiler: Add a flag to avoid compacting push constants In vec4, we can just not run the pass. In fs, things are a bit more deeply intertwined. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	aecde23519	anv: Pre-compute push ranges for graphics pipelines It turns off that emitting push constants is one of the hottest paths in the driver and ANY work we do there costs us. By pre-computing things a bit ahead of time, we shave 5% off the runtime of a CPU-limited example running with the Dawn WebGPU implementation. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	4b392ced2d	anv: Stop bounds-checking pushed UBOs The bounds checking is actually less safe than just pushing the data. If the bounds checking actually ever kicks in and it's not on the last UBO push range, then the shrinking will cause all subsequent ranges to be pushed to the wrong place in the GRF. One of the behaviors we definitely don't want is for OOB UBO access to result in completely unrelated UBOs returning garbage values. It's safer to just push the UBOs as-requested. If we're really concerned about robustness, we can emit shader code to do bounds checking which should be stupid cheap (a CMP followed by SEL). Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	ebad00d9e7	anv: Delete dead shader constant pushing code As of `2d78e55a8c`, nir_intrinsic_load_constant with a constant offset is constant-folded so we should never end up with any that trigger brw_nir_analyze_ubo_ranges. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	0709c0f6b4	anv: Flatten descriptor bindings in anv_nir_apply_pipeline_layout This lets us stop tracking the pipeline layout. It also means less indirection on a very hot path. As an extra bonus, we can make some of our data structures smaller. No measurable CPU overhead improvement. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	fa120cb31c	anv: Input attachments are always single-plane Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	0a02f2a278	genxml: Mark everything in genX_pack.h always_inline Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	abfd4651ed	anv/pipeline: Assume layout != NULL In the early days of the driver we allowed layout to be VK_NULL_HANDLE and used that for some internal pipelines when we wanted to be lazy. Vulkan doesn't actually allow NULL layouts, however, so there's no reason to have this check. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Italo Nicola	59623f211b	intel/compiler: remove old comment This comment was correct some time ago, but since commit `d3c10ad427`, it isn't true anymore. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-11-18 10:20:34 -08:00
Lionel Landwerlin	c061185e17	intel/perf: add EHL performance query support Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-11-15 13:14:30 +00:00
Lionel Landwerlin	39fd11a9f8	intel/dev: flag the Elkhart Lake platform We'll use this for performance metrics which are different from ICL. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-11-15 13:14:30 +00:00
Danylo Piliaiev	0904ee0c60	intel/fs: Do not lower large local arrays to scratch on gen7 On gen7 and earlier the scratch space size is limited to 12kB. By enabling this optimization we may easily exceed this limit without having any fallback. arb_compute_shader/linker/bug-93840.shader_test crashes with this lowering on IVB due to exceeding scratch size limit. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2092 Fixes: `69244fc7` Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-11-14 20:08:30 +00:00
Paulo Zanoni	eb6352162d	intel/compiler: fix nir_op_{i,u}*32 on ICL On ICL we have the src1 restriction which is applied through fix_byte_src() and potentially changes the type of the operands from 8 to 32 bits. When this change happens, we fall into the "else if (bit_size < 32)" case and miscompute src_type because it takes into consideration bit_size (8) instead of the adjusted size of temp_op (32). This results in the shader reading unused memory, giving us mostly failures, but occasional passes due to whatever was already in the registers we were reading. This commit fixes a lot of dEQP subgroup i8vec2 tests on ICL, such as: dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2 This can also be verified by simply changing fix_byte_src() to apply on all platforms. Fixes: `5847de6e9a` ("intel/compiler: don't use byte operands for src1 on ICL") Reviewed-by: Ivan Briano <ivan.briano@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-11-13 22:13:52 +00:00
Caio Marcelo de Oliveira Filho	0aaf47f7cd	anv: Initialize depth_bounds_test_enable when not explicitly set This was causing uninitialized value to end up propagated to the 3DSTATE_DEPTH_BOUNDS packet, leading to asserts on packet building due to the value being greater than 1. Fixes: `939ddccb7a` ("anv: Add support for depth bounds testing.") Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>	2019-11-13 10:13:27 -08:00
Rafael Antognolli	d4f628235e	anv: Use mocs settings from isl_dev. v2: Remove device->default_mocs and external_mocs (Jason). Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-12 20:41:52 +00:00
Rafael Antognolli	2b01636ddb	intel/isl: Add MOCS settings to isl_device. Centralize mocs settings into isl. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-12 20:41:52 +00:00
Danylo Piliaiev	d4c8182018	intel/blorp: Fix usage of uninitialized memory in key hashing The automatically generated padding in structs contains undefined values, force pack the structs to eliminate the padding. Otherwise structs with the same values may generate different hashes. Valgrind output: Conditional jump or move depends on uninitialised value(s) util_fast_urem32 (fast_urem_by_const.h:71) hash_table_search (hash_table.c:262) _mesa_hash_table_search (hash_table.c:296) anv_pipeline_cache_search_locked (anv_pipeline_cache.c:318) anv_pipeline_cache_search (anv_pipeline_cache.c:335) lookup_blorp_shader (anv_blorp.c:38) blorp_params_get_mcs_partial_resolve_kernel (blorp_clear.c:1112) blorp_mcs_partial_resolve (blorp_clear.c:1205) anv_image_mcs_op (anv_blorp.c:1742) anv_cmd_predicated_mcs_resolve (genX_cmd_buffer.c:774) transition_color_buffer (genX_cmd_buffer.c:1159) cmd_buffer_end_subpass (genX_cmd_buffer.c:4840) Uninitialised value was created by a stack allocation blorp_params_get_mcs_partial_resolve_kernel (blorp_clear.c:1103) Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-12 13:59:29 +02:00
Lionel Landwerlin	34f32a6d66	anv: implement VK_KHR_timeline_semaphore v2: Fix inverted condition in vkGetPhysicalDeviceExternalSemaphoreProperties() v3: Add anv_timeline_* helpers (Jason) v4: Avoid variable shadowing (Jason) Split timeline wait/signal device operations (Jason/Lionel) v5: s/point/signal_value/ (Jason) Drop piece of drm-syncobj timeline code (Jason) v6: Add missing sync_fd semaphore signaling (Jason) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-11-11 21:46:51 +00:00
Jason Ekstrand	5a4f15ef2c	anv: Plumb timeline semaphore signal/wait values through from the API Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-11 21:46:51 +00:00
Lionel Landwerlin	edc6606d4e	anv/wsi: signal the semaphore in the acquireNextImage We seem to have forgotten about the semaphore in the acquireNextImageInfo. v2: Signal semaphore/fence regardless of presentation status (Jason) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-11-11 21:46:51 +00:00
Jason Ekstrand	b10b455c1d	anv: Lock around fetching sync file FDs from semaphores Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-11 21:46:51 +00:00
Lionel Landwerlin	246261f0ad	anv: prepare the driver for delayed submissions Timeline semaphore introduce support for wait before signal behavior, which means that it is now allowed to call vkQueueSubmit() with wait semaphores not yet submitted for execution. Our kernel driver requires all of the wait primitives to be created before calling the execbuf ioctl. As a result, we must delay submissions in the userspace driver. This change store the necessary information to be able to delay a VkSubmitInfo submission to the kernel driver. v2: Fold count++ into array access (Jason) Move queue list to another patch (Jason) v3: Document cleanup of temporary semaphores (Jason) v4: Track semaphores of SYNC_FD type that needs updating after delayed submission v5: Don't forget to update sync_fd in signaled semaphores after submission (Jason) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-11-11 21:46:51 +00:00
Lionel Landwerlin	3e22363537	anv: refcount semaphores Delayed submissions required by timeline semaphores mean we need to be able to update the sync fd backed semaphores in a delayed fashion. This could mean a race between the application destroying the semaphore and the submission code trying to update it with the new sync fd. This change prepares semaphores to be refcounted, we'll most likely only take a reference for cases where we signal a sync fd semaphore. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-11-11 21:46:51 +00:00
Lionel Landwerlin	3da798c9f1	anv: prepare driver to report submission error through queues When we will submit to i915 from a submission thread, we won't be able to directly report the error to the user (in particular through the debug report callbacks). So prepare 2 paths to report errors device -> notifying the user immediately, queue -> notifying the user the next time an entry point is called. In this change we still report directly for both paths, this will change in the next commit. v2: Split NULL batch parameter handling in anv_queue_submit_simple_batch() in a different commit Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-11-11 21:46:51 +00:00
Lionel Landwerlin	89de271bc2	anv: allow NULL batch parameter to anv_queue_submit_simple_batch We can reuse device->trivial_batch_bo Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-11-11 21:46:51 +00:00
Lionel Landwerlin	f606c12731	anv: move queue init/finish to anv_queue.c Prepare the queue initialization to take on more responsabilities and possibly fail. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-11-11 21:46:51 +00:00

1 2 3 4 5 ...

4923 commits