fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-19 20:08:06 +02:00

Author	SHA1	Message	Date
Francisco Jerez	c6455cfec9	intel/fs: Don't assume packed dispatch for fragment shaders on XeHP. The current packed dispatch assumptions for fragment shaders seem to be the reason that the fs-readFirstInvocation-uint-loop Piglit test-case for the ARB_shader_ballot extension fails on DG2 in combination with the patches in this series that enable pixel pipe hashing (thanks Jordan for reporting the regression). I've confirmed that the brw_fs_test_dispatch_packing() test fails on DG2 hardware for fragment shaders, while it succeeds for other shader stages, indicating that the PSD hardware no longer guarantees packed dispatch. Disable it. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13569>	2022-01-10 18:27:41 -08:00
Francisco Jerez	ffa2ca8a77	intel/xehp: Update 3DSTATE_PS maximum number of threads per PSD. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13569>	2022-01-10 18:27:41 -08:00
Francisco Jerez	8e21cad39b	intel/xehp: Implement XeHP workaround Wa_14014148106. Actually, no, there's no need to do anything, just update some comments for the record. An earlier revision of this change that implemented the workaround text to the letter required no less than 8 new PIPE_CONTROLs throughout the tree. However Felix Degrood noticed that the cost of some of the PIPE_CONTROLs was showing up in workloads like Shadow of the Tomb Raider. The Windows driver wasn't emitting many of those pipe controls, contrary to the W/A instructions, so we engaged in a back and forth with the hardware team, who concluded that the original suggested workaround was unnecessarily strict, and the Windows driver's behavior acceptable. It turns out that Wa_1408224581 we had already implemented for TGL is roughly equivalent to the Windows behavior, so no need to do anything new after all. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14278>	2022-01-11 00:17:32 +00:00
Francisco Jerez	eeb3f4594d	intel/xehp: Implement XeHP workaround Wa_14013910100. XeHP platforms require the invalidation of the instruction cache after a STATE_BASE_ADDRESS change due to a hardware bug potentially leading to instruction cache pollution. Note that the workaround text says it's applicable "DG2 128/256/512-A/B", however it's also marked as permanent and not confirmed to be fixed in any specific steping, so we apply it to all Gfx12HP platforms. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14278>	2022-01-11 00:17:32 +00:00
Jordan Justen	0fc93928f1	isl: Don't enable HDC:L1 caches on DG2 The MOCS entry used for this on Tigerlake doesn't exist on DG2. Ref: `aca31baafc` ("isl: Enable Tigerlake HDC:L1 caches via MOCS in various cases.") Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14467>	2022-01-10 21:20:03 +00:00
Konstantin Seurer	e0d590cafb	anv: Fixed maxFragmentCombinedOutputResources Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14320>	2022-01-10 19:28:17 +00:00
Danylo Piliaiev	b8d486f298	nir/algebraic: Separate has_dot_4x8 into has_sdot_4x8 and has_udot_4x8 Adreno GPUs has native instruction for unsigned and mixed dot_4x8 but not signed dot product. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>	2022-01-10 13:20:39 +02:00
Lionel Landwerlin	07bc6b7ed9	anv: limit compiler valid color outputs using NIR variables This fixes a test from the vkd3d-proton test_dual_source_blending_dxbc test which asserts in the backend with : brw_fs_visitor.cpp:716: void fs_visitor::emit_fb_writes(): Assertion `!prog_data->dual_src_blend \|\| key->nr_color_regions == 1' failed. This is because there is 2 color attachments provided by the renderpass so we initially set nr_color_regions = 2. But once we've parsed the shader, we can see it's only using one output (with dual source color blending). This change looks at the output variables to update the valid output variables. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14417>	2022-01-10 09:38:32 +02:00
Lionel Landwerlin	1d40d53e03	anv: don't leave anv_batch fields undefined Because the extend_cb vfunc is not initialized, there is a risk that the emission code calls into a random pointer. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14418>	2022-01-07 17:28:11 +00:00
Tomeu Vizoso	c9adcb6051	anv/tests: Free BO cache and device mutex Was getting ASAN errors in CI when trying to add ANV to the debian-testing job: ==10993==ERROR: LeakSanitizer: detected memory leaks Direct leak of 4194304 byte(s) in 64 object(s) allocated from: #0 0x7f763c1bda3c in __interceptor_posix_memalign ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:226 #1 0x55f43d28627f in os_malloc_aligned ../src/util/os_memory_aligned.h:58 #2 0x55f43d28627f in _util_sparse_array_node_alloc ../src/util/sparse_array.c:107 #3 0x55f43d28627f in util_sparse_array_get ../src/util/sparse_array.c:143 #4 0x55f43d1fdaba in anv_device_lookup_bo ../src/intel/vulkan/anv_private.h:1335 #5 0x55f43d1fdaba in anv_device_import_bo_from_host_ptr ../src/intel/vulkan/anv_allocator.c:1843 #6 0x55f43d1ff571 in anv_block_pool_expand_range ../src/intel/vulkan/anv_allocator.c:534 #7 0x55f43d1ffcb5 in anv_block_pool_init ../src/intel/vulkan/anv_allocator.c:417 #8 0x55f43d18f082 in run_test ../src/intel/vulkan/tests/block_pool_no_free.c:123 #9 0x55f43d1862b6 in main ../src/intel/vulkan/tests/block_pool_no_free.c:152 #10 0x7f763b942d09 in __libc_start_main ../csu/libc-start.c:308 Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14121>	2022-01-07 13:33:32 +00:00
Tomeu Vizoso	8a7659a7a2	anv/ci: Test with deqp-vk on Tiger Lake Run half of the CTS in 10 Volteer Chromebook devices. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14121>	2022-01-07 13:33:32 +00:00
Rohan Garg	af13119993	intel/fs: OpImageQueryLod does not support arrayed images as an operand When we lower SPIR-V to NIR for textures in vtn_handle_texture, we only bump the number of coordinate components when the op is not a lod query. Update the assert to take this into account. This fixes: - dEQP-VK.robustness.robustness2.bind.template.r32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.r32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.r32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.r32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.r32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.r32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rg32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rg32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rg32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rg32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rg32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rg32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rgba32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rgba32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rgba32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rgba32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rgba32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rgba32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.r32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.r32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.r32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.r32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.r32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.r32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rg32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rg32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rg32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rg32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rg32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rg32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rgba32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rgba32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rgba32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rgba32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rgba32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rgba32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag Fixes: `231337a1` ("intel/fs/xehp: Assert that the compiler is sending all 3 coords for cubemaps.") Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13925>	2022-01-07 10:53:35 +00:00
Francisco Jerez	054eb9f346	intel/dev: Implement DG2 restrictions requiring additional DSSes to be disabled. Note that this causes a geometry slice to be disabled if any DSS is fused off within that slice, which may seem stricter than the BSpec quotation implies, but testing shows that pixel pipes with any faulted DSS don't work at all, and that using a slice with any faulted pixel pipe leads to serious graphics corruption. It would be better to query this geometry topology information from the hardware instead of trying to reconstruct it here, but the kernel interface for that is not available yet. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14436>	2022-01-07 07:58:27 +00:00
Francisco Jerez	e48c29acca	intel/dev: Add support for pixel pipe subslice accounting on multi-slice GPUs. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14436>	2022-01-07 07:58:27 +00:00
Francisco Jerez	f3274e94fd	intel/dev: Fix size of device info num_subslices array. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14436>	2022-01-07 07:58:27 +00:00
Caio Oliveira	87e2d2249d	anv/blorp: Apply pending pipe flushes after PIPELINE_SELECT Allows the PIPELINE_SELECT change to consume any outstanding flushes. In case it doesn't, we still apply the pipe flushses afterwards. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14301>	2022-01-07 03:14:55 +00:00
Caio Oliveira	313aeee84b	anv: Use pending pipe control mechanism in flush_pipeline_select() This removes the repeated implementation of a workaround and a per-platform case. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14301>	2022-01-07 03:14:55 +00:00
Caio Oliveira	9ba7bc17d3	anv: Add another case to INTEL_DEBUG=pc output Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14301>	2022-01-07 03:14:55 +00:00
Jordan Justen	d57b10ab98	intel/compiler: Adjust TCS instance-id for dg2+ Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14385>	2022-01-05 16:13:28 -08:00
Jason Ekstrand	a9321b1309	anv: Use the common QueueSignalReleaseImageANDROID from RADV This is an actual functional change as we now plumb through the sync FD instead of doing a vkQueueSubmit and trusting in implicit sync. Reviewed-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Tested-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14372>	2022-01-05 16:36:10 +00:00
Jason Ekstrand	dfb1e1777c	anv,radv,v3dv: Move AcquireImageANDROID to common code All three implementations are identical. Reviewed-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Tested-by: Tapani Pälli <tapani.palli@intel.com> Tested-by: Roman Stratiienko <r.stratiienko@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14372>	2022-01-05 16:36:10 +00:00
Uday Kiran Pichika	78ef08a061	anv: enable adaptive sync for ANV Signed-off-by: Uday Kiran Pichika <pichika.uday.kiran@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6420>	2022-01-02 18:53:29 +00:00
Henry Goffin	fe617bcca0	intel/compiler/test: Fix build with GCC 7 Without this change, test_fs_scoreboard.cpp does not compile on GCC 7 due to the use of C99 initializers in a C++ source file. Fixes: `c847bfaaf5` ("intel/fs/gen12: Add tests for scoreboard pass") Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14349>	2021-12-30 19:59:52 +00:00
Dave Airlie	a2293e33fd	intel/genxml/gen4-5: fix more Raster Operation in BLT to be a uint This has been partly fixed twice before, but looks like some got missed. Fixes arb_copy_image on gen4 with crocus Fixes: `de625dddee` ("intel/genxml: fix raster operation field in blt genxml") Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14345>	2021-12-30 11:40:33 +10:00
Lionel Landwerlin	eca7b24e74	intel/devinfo: adjust subslice array size Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14324>	2021-12-28 14:22:53 -08:00
Dave Airlie	4392c24844	intel/compiler: drop unused decleration Acked-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14202>	2021-12-22 21:37:55 +00:00
Dave Airlie	2692a5f8db	intel/compiler: don't lower swizzles in backend. These are lowered by crocus in the frontend, the key entries are still used. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14202>	2021-12-22 21:37:55 +00:00
Dave Airlie	e12b0d0d60	intel/compiler: remove gfx6 gather wa from backend. Crocus lowers this in the frontend, they key member is still used but reset prior to backend. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14202>	2021-12-22 21:37:55 +00:00
Marcin Ślusarz	a48f1d51e2	intel/compiler: disable workaround not applicable to gfx >= 11 There's nothing in bspec that would suggest this is still needed. It only affected gfx 9 and 10. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14280>	2021-12-22 10:13:25 +00:00
Caio Oliveira	ac90519e35	anv: Simplify assertions related to graphics stages In all three cases, COMPUTE was on the table but with an invalid value (zero). Drop it from the tables and the extra assertion, so if a COMPUTE is passed it will just fail the ARRAY_SIZE assertion. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14274>	2021-12-21 18:25:05 +00:00
Caio Oliveira	de916d827f	anv: Refactor dirty masking in cmd_buffer_flush_state Instead of masking the dirty variable itself, use an appropriate mask in the users of dirty. This will avoid extra tracking when dealing with Task/Mesh later. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14275>	2021-12-21 11:07:31 +00:00
Caio Oliveira	37fca614b8	anv/blorp: Split blorp_exec into a render and compute And set the relevant push_constants_dirty for each case. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14275>	2021-12-21 11:07:31 +00:00
Francisco Jerez	e7470a40c5	intel/fs: Add physical fall-through CFG edge for unconditional BREAK instruction. This adds a missing CFG edge that represents a possible physical control flow path the EU might take under some conditions which isn't part of the logical CFG of the program. This possibility shouldn't have led to problems on platforms prior to Gfx12, since the missing control flow edge cannot possibly influence liveness intervals. However on Gfx12+ it becomes the compiler's responsibility to resolve data dependencies across instructions, and the missing physical control flow paths may lead to a WaR data hazard currently not visible to the software scoreboard pass, which could lead to data corruption. Worse, the possibility for this path to be taken by the EU increases on Gfx12+ due to a hardware bug affecting EU fusion -- However the same physical path can be potentially taken on earlier platforms as well, so this patch extends the CFG on all platforms for consistency, even though the lack of this edge shouldn't lead to any functional issues on platforms earlier than Gfx12. There are no shader-db changes on earlier platforms, so there seems to be no disadvantage from using the same CFG representation as on later platforms. This issue has ben reported on TGL with the following conformance test, thanks to Ian for bringing the FULSIM dependency check warning to my attention: dEQP-VK.graphicsfuzz.spv-stable-pillars-volatile-nontemporal-store Fixes: `4d1959e693` ("intel/cfg: Represent divergent control flow paths caused by non-uniform loop execution.") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4940 Reported-by: Tapani Pälli <tapani.palli@intel.com> Reported-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14248>	2021-12-21 00:43:29 +00:00
Rafael Antognolli	e9b509755b	intel: Emit 3DSTATE_BINDING_TABLE_POOL_ALLOC for XeHP On XeHP+, Binding Table Pointers are an offset relative to the Surface State Base Address anymore. Instead, they are relative to the State Binding Table Pool Address, which is set by the command above. We emit that command (pointing to the same address as the Surface State Base Addresss), and everything should stay working as before. Reworks: * Jordan: Add iris * Jordan: Drop i965 * Ken: Set MOCS to avoid a major perf impact. (Found by Felix DeGrood.) * Jordan: Shrink size from 2MiB to actual iris, anv usage * Lionel: Add BINDING_TABLE_POOL_BLOCK_SIZE Ref: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4995 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> [jordan.l.justen@intel.com: Add Iris, adjust sizes] Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13992>	2021-12-20 17:58:13 +00:00
Jordan Justen	e6fc231184	anv: Add BINDING_TABLE_POOL_BLOCK_SIZE Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13992>	2021-12-20 17:58:13 +00:00
Jordan Justen	1ed7a65e6d	intel/genxml/12.5: Remove bt-pool enable from 3DSTATE_BINDING_TABLE_POOL_ALLOC This was dropped in gfx12.5. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13992>	2021-12-20 17:58:13 +00:00
Jason Ekstrand	eebb2dedb2	intel/fs: Add a NONE scheduling mode While our LIFO scheduling mode attempts to optimize for register pressure, it's often hard for a scheduling algorithm to do better than the instruction order provided by the shader author. Shader authors often do perfectly reasonable things like using texture results immediately after fetching them or constructing texture coordinates immediately before the texture op. When we throw all the instruction ordering information away, we loose any help the author may have given us. By attempting NONE before we fall back to the worst case LIFO mode. And, yes, I tried this with NONE both before and after LIFO and doing NONE before LIFO is substantially better, according to shader-db. total instructions in shared programs: 19673152 -> 19665202 (-0.04%) instructions in affected programs: 33669 -> 25719 (-23.61%) helped: 20 HURT: 0 helped stats (abs) min: 15 max: 4609 x̄: 397.50 x̃: 107 helped stats (rel) min: 2.33% max: 67.50% x̄: 14.60% x̃: 9.12% 95% mean confidence interval for instructions value: -867.61 72.61 95% mean confidence interval for instructions %-change: -21.74% -7.46% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 935562500 -> 935020920 (-0.06%) cycles in affected programs: 18620349 -> 18078769 (-2.91%) helped: 104 HURT: 48 helped stats (abs) min: 88 max: 60986 x̄: 8031.48 x̃: 3680 helped stats (rel) min: 0.61% max: 51.44% x̄: 14.95% x̃: 8.87% HURT stats (abs) min: 10 max: 54724 x̄: 6118.62 x̃: 1530 HURT stats (rel) min: 0.13% max: 46.45% x̄: 10.28% x̃: 6.46% 95% mean confidence interval for cycles value: -5724.34 -1401.71 95% mean confidence interval for cycles %-change: -9.86% -4.10% Cycles are helped. total spills in shared programs: 12158 -> 10327 (-15.06%) spills in affected programs: 1831 -> 0 helped: 20 HURT: 0 total fills in shared programs: 14749 -> 12635 (-14.33%) fills in affected programs: 2114 -> 0 helped: 20 HURT: 0 LOST: 8 GAINED: 649 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>	2021-12-18 01:46:19 +00:00
Jason Ekstrand	e6ddee764e	intel/fs: Reset instruction order before re-scheduling The way the current scheduler loop is implemented, each scheduling pass starts with what the previous pass had. This means that, if PRE screwed everything up majorly, PRE_NON_LIFO would have to try to fix it. It also meant that tiny changes to one pass would affect every later pass. Instead, reset the order of the instructions before each scheduling pass. This makes the passes entirely independent of each other. Shader-db results on Ice Lake: total instructions in shared programs: 19670486 -> 19670648 (<.01%) instructions in affected programs: 25317 -> 25479 (0.64%) helped: 2 HURT: 7 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 0.07% max: 0.07% x̄: 0.07% x̃: 0.07% HURT stats (abs) min: 8 max: 70 x̄: 24.29 x̃: 12 HURT stats (rel) min: 0.41% max: 4.95% x̄: 1.47% x̃: 0.87% 95% mean confidence interval for instructions value: -1.28 37.28 95% mean confidence interval for instructions %-change: -0.04% 2.30% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 935535948 -> 935490243 (<.01%) cycles in affected programs: 421994824 -> 421949119 (-0.01%) helped: 1269 HURT: 879 helped stats (abs) min: 1 max: 12008 x̄: 259.38 x̃: 52 helped stats (rel) min: <.01% max: 28.02% x̄: 1.12% x̃: 0.14% HURT stats (abs) min: 1 max: 29931 x̄: 322.46 x̃: 20 HURT stats (rel) min: <.01% max: 32.17% x̄: 1.74% x̃: 0.22% 95% mean confidence interval for cycles value: -71.37 28.81 95% mean confidence interval for cycles %-change: -0.11% 0.21% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 12403 -> 12430 (0.22%) spills in affected programs: 1355 -> 1382 (1.99%) helped: 2 HURT: 7 total fills in shared programs: 15128 -> 15182 (0.36%) fills in affected programs: 3294 -> 3348 (1.64%) helped: 2 HURT: 7 LOST: 21 GAINED: 28 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>	2021-12-18 01:46:19 +00:00
Jason Ekstrand	d49d092259	Revert "intel/fs: Do cmod prop again after scheduling" This reverts commit `ba2fa1ceaf`. Doing optimizations after scheduling but before RA means doing them in the middle of the scheduling loop which introduces additional dependencies between one scheduling iteration and the next. That won't work if we want to make the scheduling modes independent, at least not unless we have some way of fully cloning the IR. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>	2021-12-18 01:46:19 +00:00
Jason Ekstrand	e6f0def97d	intel/eu: Don't double-loop as often in brw_set_uip_jip brw_find_next_block_end() scans through the instructions to find the end of the block. We were calling it for every instruction in the program which is, if you have a single basic block, makes the whole mess a nice clean O(n^2) when it really doesn't need to be. Instead, only call brw_find_next_block_end() as-needed. This brings it back to O(n) like it should have been. This cuts the runtime of the following Vulkan CTS on my SKL box by 5% from 1:51 to 1:45: dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>	2021-12-18 01:46:19 +00:00
Jason Ekstrand	cf98a3cc19	intel/fs: Use OPT() for split_virtual_grfs Now that we're being conservative in the pass, it's easy to tell when it makes progress and we can put it in the OPT() macro. This way, we get nice INTEL_DEBUG=optimizer dumps for it. While we're here, fix the header comment which is massively out-of-date. Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>	2021-12-18 01:46:19 +00:00
Jason Ekstrand	38fa18a7a3	intel/fs: Be more conservative in split_virtual_grfs Instead of modifying every single instruction, keep track of which VGRFs are actually split in a bit-set, and only modify the instructions that actually touch split regs. This cuts the runtime of the following Vulkan CTS on my SKL box by 45% from 3:21 to 1:51: dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>	2021-12-18 01:46:19 +00:00
Jason Ekstrand	288a670f17	anv/pipeline: Get rid of sample_shading_enable Putting it in the pipeline is a bit of a lie. We no longer need it for nir_lower_wpos_center. The only other user is pipeline_has_coarse_pixel and that is used to build the shader key which we construct before we've processed any NIR so we don't have accurate information at that time anyway. Instead, look at ms_info->sampleShadingEnable directly in pipeline_has_coarse_pixel and trust the back-end to deal with disabling coarse when we need per-sample dispatch. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>	2021-12-17 16:02:16 +00:00
Jason Ekstrand	deec7a590b	anv,nir: Use sample_pos_or_center in lower_wpos_center Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>	2021-12-17 16:02:16 +00:00
Jason Ekstrand	3c89dbdbfe	intel/fs: Implement the sample_pos_or_center system value Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>	2021-12-17 16:02:16 +00:00
Jason Ekstrand	a580fd55e1	intel/fs: Rework emit_samplepos_setup() This rolls compute_sample_position into emit_samplepos_setup, its only caller, by using a loop instead of calling it twice. We also early-return for the !persample_dispatch case instead of doing it as part of the sample calculation. This means that we don't call fetch_payload_reg() to get sample_pos_reg unless we're actually going to use it so the function is safe to call even if we haven't set up sample_pos_reg. This will be important for the next commit. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>	2021-12-17 16:02:16 +00:00
Jason Ekstrand	ac7255ed1e	intel/fs: Return fs_reg directly from builtin setup helpers There's no good reason why we're allocating them on the heap and returning a pointer. Return the fs_reg directly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>	2021-12-17 16:02:16 +00:00
Jason Ekstrand	3878094eb1	anv: Drop anv_sync_create_for_bo The older helper is unused so we can roll it all into anv_create_sync_for_memory. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14237>	2021-12-17 00:55:31 +00:00
Lionel Landwerlin	b00086d393	anv,wsi: simplify WSI synchronization Rather than using 2 vfuncs, use one since we've unified the synchronization framework in the runtime with a single vk_sync object. v2 (Jason Ekstrand): - create_sync_for_memory is now in vk_device Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14237>	2021-12-17 00:55:31 +00:00
Jason Ekstrand	9ae1e621e5	anv: Implement vk_device::create_sync_for_memory Fixes: `36ea90a361` ("anv: Convert to the common sync and submit framework") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14237>	2021-12-17 00:55:31 +00:00

1 2 3 4 5 ...

7453 commits