fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-01-17 04:50:19 +01:00

Author	SHA1	Message	Date
Jason Ekstrand	7628585dd7	anv: Refactor setting descriptors with immutable sampler Don't call anv_sampler_from_handle if the handle may be invalid. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4690>	2020-05-04 14:06:27 +00:00
Jason Ekstrand	73fb7cdbe1	vulkan,anv: Move the DEFINE_HANDLE_CASTS macros to vk_object.h We've already got these duplicated a bunch of places. They should really probably live in common code. The new versions take two more arguments: 1. The struct member which gets you from __driver_type to the vk_object_base. This requires drivers which use this to also use vk_object_base. 2. The VkObjectType enum which represents that object type. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4690>	2020-05-04 14:06:27 +00:00
Jason Ekstrand	682c81bdfb	vulkan,anv: Add a base object struct type Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4690>	2020-05-04 14:06:27 +00:00
Jason Ekstrand	369703774c	anv: Allocate CPU-side memory for events As discrete graphics looms, we really need to stop storing CPU data structures in GPU memory. One of the most egregious instances of this was VkEvent where we had a CPU data structure living inside a dynamic state pool allocation. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4690>	2020-05-04 14:06:27 +00:00
Jason Ekstrand	4ac4e8e11f	anv: Stop clflushing events They're allocated out of the dynamic state pool which is snooped. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4690>	2020-05-04 14:06:27 +00:00
Jason Ekstrand	a9158f7951	vulkan,anv: Add a common base object type for VkDevice We should keep this very minimal; I don't know that we need to go all struct gl_context on it. However, this gives us at least a tiny base on which we can start building some common functionality. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4690>	2020-05-04 14:06:27 +00:00
Dave Airlie	b2164320a0	i965: add support for gen 5 pipelined pointers to dump I wanted to see inside these, so added support to the dumper. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4846>	2020-05-03 05:47:16 +10:00
Nataraj Deshpande	49cc9e9526	anv: Disable extensions based on Android versions This extends commit `2243f0cd` for anv with additional extensions for Pie and Q versions. Fixes tests with 9_R11 CTS: dEQP-VK.api.info.android#no_unknown_extensions dEQP-VK.api.info.device#extensions. v2: Use snake_case function name (Jason Ekstrand) Drop Change-Id in commit (Kristian H. Kristensen) v3: Resolve meson-clang error for ANDROID_API_LEVEL. Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4827>	2020-05-01 21:04:26 +00:00
Nataraj Deshpande	a77cf797f1	anv: Limit vulkan version to 1.1 for Android Current Android dessert versions such as Pie, Q reject vulkan version > 1.1. Clamp the vulkan versions to 1.1 for platforms running these Android desserts. Fixes android.graphics.cts.VulkanFeaturesTest and dEQP-VK.api.info.device#properties. v2: Limit version with '!ANDROID' (Eric Engestrom and Tapani Pälli) Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com> Reviewed-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4781>	2020-05-01 20:50:54 +00:00
Caio Marcelo de Oliveira Filho	e645bc6939	intel: Let drivers call brw_nir_lower_cs_intrinsics() The motivating factor is: this lowering may cause nir_intrinsic_load_local_group_size intrinsics to be added to the shader, and by moving this around we make possible for the drivers to lower that intrinsic by themselves. Iris will do just that in a later patch for implementing variable group size. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>	2020-05-01 12:50:37 -07:00
Caio Marcelo de Oliveira Filho	2663759af0	intel/fs: Add and use a new load_simd_width_intel intrinsic Intrinsic to get the SIMD width, which not always the same as subgroup size. Starting with a small scope (Intel), but we can rename it later to generalize if this turns out useful for other drivers. Change brw_nir_lower_cs_intrinsics() to use this intrinsic instead of a width will be passed as argument. The pass also used to optimized load_subgroup_id for the case that the workgroup fitted into a single thread (it will be constant zero). This optimization moved together with lowering of the SIMD. This is a preparation for letting the drivers call it before the brw_compile_cs() step. No shader-db changes in BDW, SKL, ICL and TGL. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>	2020-05-01 12:50:37 -07:00
Caio Marcelo de Oliveira Filho	4b000b491a	intel/fs: Add an option to lower variable group size in backend Adding this since Iris will handle variable group size parameters by itself. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>	2020-05-01 12:50:28 -07:00
Caio Marcelo de Oliveira Filho	0edb58a84e	intel/fs: Clean up variable group size handling in backend Just use the information from NIR shader_info. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>	2020-05-01 12:50:28 -07:00
Kenneth Graunke	615270502c	intel: Move anv_gem_supports_syncobj_wait to common code. This will let me use this in iris. We leave the existing anv function for anv_gem_stubs.c faking, but move the contents to a helper in a new src/intel/common/gen_gem.c file. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802>	2020-05-01 19:00:02 +00:00
Kenneth Graunke	812cf5f522	anv: Include linux/sync_file.h instead of cut and pasting contents Linux 4.7 has been out for a long time, this is probably safe to depend on at this point, rather than cut and pasting the contents. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802>	2020-05-01 19:00:02 +00:00
Caio Marcelo de Oliveira Filho	2a05ba5414	intel/dev: Bail when INTEL_DEVID_OVERRIDE is not valid Avoids surprises where you set an OVERRIDE but it gets ignored and the system PCI ID is used. Also fixes the bug that the error of invalid platform name being printed too early, even when the passed platform was a PCI ID (which is also supported). For the case where euid != uid, a warning was added but the behavior wasn't changed: it is still going to fallback to system PCI ID. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4841>	2020-05-01 10:12:01 -07:00
D Scott Phillips	65b05ebdda	anv,iris: Fix input vertex max for tcs on gen12 gen12 does away with the single patch dispatch mode for tcs, and increases some limits so that 8_patch mode can always work. Make the necessary changes so we don't try to fall back to single patch mode. Fixes KHR-GL46.tessellation_shader.single.max_patch_vertices and others Fixes: `44754279ac` ("intel/fs/gen12: Use TCS 8_PATCH mode.") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4843>	2020-05-01 16:49:11 +00:00
D Scott Phillips	7bd15135a6	intel/fs: Update location of Render Target Array Index for gen12 Render Target Array Index has moved from R0.0[26:16] to R1.1[26:16] on gen12. Fixes dEQP-VK.multiview.input_attachments.* Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4836>	2020-05-01 08:48:22 -07:00
Jason Ekstrand	3fac55ce0d	Revert "anv/gen12: Temporarily disable VK_KHR_buffer_device_address (and EXT)" This reverts commit `c61ad77cd2`. We now no longer have a problem with these. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4819>	2020-04-30 14:45:50 +00:00
Jason Ekstrand	4985e380dd	intel/eu: Use non-coherent mode (BTI=253) for stateless A64 messages We don't care about full IA coherency since we always have the opportunity in GL or Vulkan to flush the data cache. Using IA-coherent mode is likely just making A64 access slower than it needs to be. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4819>	2020-04-30 14:45:50 +00:00
Lionel Landwerlin	0f4f1d70bf	intel: add stub_gpu tool Run shaderdb like this : intel_stub_gpu -p bxt ./run ./shaders/* List of platform names is available from gen_device_name_to_pci_device_id() (src/intel/dev/gen_device_info.c). v2: Add missing getparam support Raise max soft limit of file descriptors Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4594>	2020-04-30 11:32:54 +03:00
Lionel Landwerlin	8c3c1d8a99	intel/dev: print out error when platform is not found by name Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4594>	2020-04-30 11:32:54 +03:00
Francisco Jerez	0842758ec0	intel/ir: Update performance analysis parameters for memory fence codegen changes. The SFID field of the SHADER_OPCODE_MEMORY_FENCE and SHADER_OPCODE_INTERLOCK instructions now indicates the target function of the memory fence. Account the cycle-count cost to the right shared unit. Fixes: `f858fa26b4` ("intel/fs,vec4: Pull stall logic for memory fences up into the IR") Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4817>	2020-04-29 23:40:36 +00:00
Jason Ekstrand	e581ddeeee	intel/fs: Don't delete coalesced MOVs if they have a cmod Shader-db results on ICL: total instructions in shared programs: 17133088 -> 17133287 (<.01%) instructions in affected programs: 61300 -> 61499 (0.32%) helped: 0 HURT: 199 This means it's likely fixing 199 bugs. :-) All the changed shaders are in Mad Max. It's surprisingly difficult to get the back-end compiler to generate a pattern that hits this we don't tend to emit a lot coalescable MOVs. The pattern in Mad Max that's able to hit is fsign(fsat(x)) under the right conditions. Closes: #2820 Cc: mesa-stable@lists.freedesktop.org Tested-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4773>	2020-04-29 16:45:51 +00:00
Tapani Pälli	1a33358b27	anv: remove assert from GetImageMemoryRequirements[2] This assert is actually correct but due to how android hardware buffer support is implemented we should remove it, otherwise debug build of mesa hits the assert with Android CTS tests. Test creates VkImage with non-external format and sets up VkExternalMemoryImageCreateInfo to indicate that image may be used with Android hardwarebuffer handle. Then test attempts to get image memory requirements. Problem with this is that we setup all android supporting images as having external format and thus hit the assert as the size has not been set yet. This is not a problem in practice since android will bind ahw memory with the image later on. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2807 Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4762>	2020-04-29 08:30:42 +00:00
Caio Marcelo de Oliveira Filho	a3cba3c771	intel/fs: Only stall after sending all memory fence messages In Gen11+, when emitting a fence for both L3 and SLM, the generated code would look like SEND, MOV (for stall), SEND, MOV (for stall) This commit change that so two SENDs are emitted before the MOVs for stall. This is similar to the approach used in Ivy Bridge for the render fence. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>	2020-04-29 07:17:27 +00:00
Caio Marcelo de Oliveira Filho	f858fa26b4	intel/fs,vec4: Pull stall logic for memory fences up into the IR Instead of emitting the stall MOV "inside" the SHADER_OPCODE_MEMORY_FENCE generation, use the scheduling fences when creating the IR. For IvyBridge, every (data cache) fence is accompained by a render cache fence, that now is explicit in the IR, two SHADER_OPCODE_MEMORY_FENCEs are emitted (with different SFIDs). Because Begin and End interlock intrinsics are effectively memory barriers, move its handling alongside the other memory barrier intrinsics. The SHADER_OPCODE_INTERLOCK is still used to distinguish if we are going to use a SENDC (for Begin) or regular SEND (for End). This change is a preparation to allow emitting both SENDs in Gen11+ before we can stall on them. Shader-db results for IVB (i965): total instructions in shared programs: 11971190 -> 11971200 (<.01%) instructions in affected programs: 11482 -> 11492 (0.09%) helped: 0 HURT: 8 HURT stats (abs) min: 1 max: 3 x̄: 1.25 x̃: 1 HURT stats (rel) min: 0.03% max: 0.50% x̄: 0.14% x̃: 0.10% 95% mean confidence interval for instructions value: 0.66 1.84 95% mean confidence interval for instructions %-change: 0.01% 0.27% Instructions are HURT. Unlike the previous code, that used the `mov g1 g2` trick to force both `g1` and `g2` to stall, the scheduling fence will generate `mov null g1` and `mov null g2`. During review it was decided it was not worth keeping the special codepath for the small effect will have. Shader-db results for HSW (i965), BDW and SKL don't have a change on instruction count, but do report changes in cycles count, showing SKL results below total cycles in shared programs: 341738444 -> 341710570 (<.01%) cycles in affected programs: 7240002 -> 7212128 (-0.38%) helped: 46 HURT: 5 helped stats (abs) min: 14 max: 1940 x̄: 676.22 x̃: 154 helped stats (rel) min: <.01% max: 2.62% x̄: 1.28% x̃: 0.95% HURT stats (abs) min: 2 max: 1768 x̄: 646.40 x̃: 362 HURT stats (rel) min: <.01% max: 0.83% x̄: 0.28% x̃: 0.08% 95% mean confidence interval for cycles value: -777.71 -315.38 95% mean confidence interval for cycles %-change: -1.42% -0.83% Cycles are helped. This seems to be the effect of allocating two registers separatedly instead of a single one with size 2, which causes different register allocation, affecting the cycle estimates. while ICL also has not change on instruction count but report changes negative changes in cycles total cycles in shared programs: 352665369 -> 352707484 (0.01%) cycles in affected programs: 9608288 -> 9650403 (0.44%) helped: 4 HURT: 104 helped stats (abs) min: 24 max: 128 x̄: 88.50 x̃: 101 helped stats (rel) min: <.01% max: 0.85% x̄: 0.46% x̃: 0.49% HURT stats (abs) min: 2 max: 2016 x̄: 408.36 x̃: 48 HURT stats (rel) min: <.01% max: 3.31% x̄: 0.88% x̃: 0.45% 95% mean confidence interval for cycles value: 256.67 523.24 95% mean confidence interval for cycles %-change: 0.63% 1.03% Cycles are HURT. AFAICT this is the result of the case above. Shader-db results for TGL have similar cycles result as ICL, but also affect instructions total instructions in shared programs: 17690586 -> 17690597 (<.01%) instructions in affected programs: 64617 -> 64628 (0.02%) helped: 55 HURT: 32 helped stats (abs) min: 1 max: 16 x̄: 4.13 x̃: 3 helped stats (rel) min: 0.05% max: 2.78% x̄: 0.86% x̃: 0.74% HURT stats (abs) min: 1 max: 65 x̄: 7.44 x̃: 2 HURT stats (rel) min: 0.05% max: 4.58% x̄: 1.13% x̃: 0.69% 95% mean confidence interval for instructions value: -2.03 2.28 95% mean confidence interval for instructions %-change: -0.41% 0.15% Inconclusive result (value mean confidence interval includes 0). Now that more is done in the IR, more dependencies are visible and more SWSB annotations are emitted. Mixed with different register allocation decisions like above, some shaders will see more `sync nops` while others able to avoid them. Most of the new `sync nops` are also redundant and could be dropped, which will be fixed in a separate change. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>	2020-04-29 07:17:27 +00:00
Caio Marcelo de Oliveira Filho	0e96b0d6dd	intel/fs: Allow FS_OPCODE_SCHEDULING_FENCE stall on registers It will generate the MOVs (or SYNC_NOP in Gen12+) needed for stall. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>	2020-04-29 07:17:27 +00:00
Francisco Jerez	5e2a7e11b4	intel/ir: Remove scheduling-based cycle count estimates. The cycle count estimation logic part of the scheduler is now redundant with the shader performance modeling pass, and the estimates can be consolidated into the brw::performance analysis result object instead of being part of the CFG, which guarantees that the estimates cannot be accessed without previously calling the performance_analysis::require() method, which makes sure that the right analysis pass is executed at the right time if we don't already have up-to-date cached results. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	486f3b04a5	intel/ir: Pass block cycle count information explicitly to disassembler. So we can eventually remove the cycle count estimates from the CFG data structure and consolidate performance information in the brw::performance object. It would be cleaner to pass the brw::performance object directly to the disassembler but that isn't straightforward since the disassembler is built as a plain C file unlike the rest of the compiler back-end. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	6579f562c3	intel/ir: Use brw::performance object instead of CFG cycle counts for codegen stats. These should be more accurate than the current cycle counts, since among other things they consider the effect of post-scheduling passes like the software scoreboard on TGL. In addition it will enable us to clean up some of the now redundant cycle-count estimation functionality in the instruction scheduler. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	65342be3ae	intel/fs: Add INTEL_DEBUG=no32 debugging flag. This is useful in order to identify codegen issues caused by SIMD32. It doesn't currently have any effect on compute shaders since SIMD32 dispatch is only enabled for CS when it's strictly necessary to do so in order to support the workgroup size requested for the shader -- That might change in the future though when we hook up the SIMD32 heuristic to CS compilation. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	14f0a5cf64	intel/fs: Implement performance analysis-based SIMD32 heuristic for fragment shaders. The heuristic enables the SIMD32 fragment shader based on whether the IR performance modeling pass predicts it to have greater throughput than the SIMD16 and SIMD8 variants of the same shader. It would be straightforward to do the same thing in order to control whether SIMD16 dispatch is enabled, but it's pending additional performance evaluation. The INTEL_DEBUG=do32 option is left around in order to force the SIMD32 shader to be used regardless of the result of the heuristic, since it's useful as a debugging aid e.g. in order to identify SIMD32-specific codegen issues which may be masked by the SIMD32 heuristic, or cases where the heuristic is incorrectly disabling SIMD32 shaders that offer a performance advantage. Currently this is only enabled on Gen6+, since SIMD32 codegen support is incomplete on earlier platforms. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	d6aa0c261f	intel/fs: Heap-allocate fs_visitors in brw_compile_fs(). This makes brw_compile_fs() look a bit more similar to brw_compile_cs(). It saves us three v*_shader_stats local variables, and will save us additional triplicated declarations as we start tracking IR performance analysis results. The triplicated cfg pointers are left around because they're set to NULL to mark specific dispatch modes as disabled (e.g. in order to enforce hardware restrictions). Doing the same thing with the visitor pointers would cause data leaks. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	188a3659ae	intel/ir: Import shader performance analysis pass. This introduces an analysis pass intended to estimate several performance statistics of the shader, including cycle count latency and throughput values, based on static modeling. It has instruction performance information more comprehensive than the current scheduling pass for all platforms between Gen4-11, and works on both the FS and VEC4 back-end. The most immediate purpose of this pass is to implement a heuristic meant to determine whether using SIMD32 dispatch for a fragment shader can be expected to help more than it hurts. In addition this will allow the effect of passes run after scheduling (e.g. the TGL software scoreboard pass and the VEC4 dependency control pass) to be visible in shader-db statistics. But that isn't the end of the story, other potential applications of this pass (not part of this MR) I've been playing around with are: - Implement a similar SIMD16 heuristic allowing the identification of inefficient SIMD16 fragment shaders. - Implement similar SIMD16 and SIMD32 heuristics for the compute shader stage -- Currently compute shader builds always use the SIMD16 shader if available and never use the SIMD32 shader unless strictly necessary, which is suboptimal under certain conditions. - Hook up to the instruction scheduler in order to improve the accuracy of its timing information. - Use as heuristic in order to drive the selection of scheduling modes (Matt was experimenting with that). - Plug to the TGL software scoreboard pass in order to implement a more effective SBID token allocation algorithm, since in general the optimal token allocation depends on the timings of all instructions in the program. - Use its bottleneck detection functionality in order to implement a heuristic computing a more optimal bound for the number of fragment shader threads executed in parallel (by adjusting the MaximumNumberofThreadsPerPSD control of 3DSTATE_PS). As a follow-up I'm planning to submit updated timing information for Gen12 platforms -- Everything else required to support Gen12 like SWSB handling is already included in this patch, but there were some IP concerns regarding the TGL timing parameters since they cannot currently be obtained with the documentation and hardware which is publicly available. The timing parameters for any previous Gen7-11 platforms can be obtained by anyone by sampling the timestamp register using e.g. shader_time, though I have some more convenient instrumentation coming up. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:03 -07:00
Francisco Jerez	c8ce1cfc9c	intel/vec4: Fix constness of vec4_instruction::reads_flag() and ::writes_flag(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:29 -07:00
Francisco Jerez	bda1d72dd9	intel/fs: Replace fs_visitor::bank_conflict_cycles() with stand-alone function. This will be re-usable by the IR performance analysis pass. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:29 -07:00
Francisco Jerez	d2ed740795	intel/fs: Fix constness of argument of fs_instruction_scheduler::is_compressed(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:29 -07:00
Francisco Jerez	6310a05f68	intel/fs: Rename half() helpers to quarter(), allow index up to 3. Makes more sense considering SIMD32. Relaxing the assertion in brw_ir_fs.h will be required in order to avoid assertion failures on SNB with SIMD32 fragment shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:29 -07:00
Francisco Jerez	bdad7f429a	intel/ir: Add missing initialization of backend_reg::offset during construction. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:29 -07:00
Francisco Jerez	e549e4f6c0	intel/fs/gen12: Fix Render Target Read header setup for new thread payload layout. In Gen12 the Poly 0 Info DWORD containing the Viewport Index and Render Target Index fields were moved from r0.0 to r1.1 in order to make room for dual-polygon dispatch. The render target message format was updated to expect that information in the same location, so we didn't need to make any changes for framebuffer fetch to work with SIMD8 and SIMD16 dispatch. Unfortunately that won't work with SIMD32, since the render target message header is assembled from r0 and r2 instead of r1, and the r2 thread payload wasn't updated with an additional copy of the same information. We need to fix things up manually instead. This avoids a handful of EXT_shader_framebuffer_fetch regressions in combination with SIMD32 fragment shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:29 -07:00
Francisco Jerez	72324035fb	intel/fs/gen12: Work around dual-source blending hangs in combination with SIMD32. This applies the same work-around I commited as `b84fa0b31e` "intel/fs/gen11: Work around dual-source blending hangs in combination with SIMD32." to Gen12, which seems to suffer from the same hardware bug found empirically. The failure mode seems to be identical. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:28 -07:00
Francisco Jerez	d6ae079771	intel/fs/gen12: Fix hangs with per-sample SIMD32 fragment shader dispatch. The Gen12 docs are rather contradictory regarding the dispatch configurations supported by the fragment shader -- The same table present in previous generations seems to imply that only one dispatch mode can be enabled when doing per-sample shading, but a restriction documented in the 3DSTATE_PS_BODY page implies the opposite: That SIMD32 can only be used in combination with some other dispatch mode. The latter seems to match the behavior of real hardware as I could tell from my testing: A bunch of multisample test-cases that do per-sample shading hang if we only provide a SIMD32 shader. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:28 -07:00
Jason Ekstrand	b43366497b	anv: Claim VK_EXT_robustness2 support Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>	2020-04-28 22:55:25 +00:00
Jason Ekstrand	b07d26be65	anv: Handle null vertex buffer bindings Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>	2020-04-28 22:55:25 +00:00
Jason Ekstrand	fd817291c7	anv: Handle NULL descriptors Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>	2020-04-28 22:55:25 +00:00
Jason Ekstrand	76d2772472	anv: Allow all clear colors for texturing on Gen11+ Starting with Gen11, we have two indirect clear colors: An unconverted float/int version which is us used for rendering and a converted pixel value version which is used for texturing. Because the one used for texturing is stored as a single pixel of that color, it works no matter what format is being used. Because it's a simple HW indirect and doesn't involve copying surface states around, we can use it in the sampler without having to worry about surface states having out-of-date clear values. The result is that we can now allow any clear color when texturing. This cuts the number of resolves in a RenderDoc trace of Dota2 by 95% on Gen11+ (you read that right) and improves perf by 3.5%. It improves perf in a handful of other workloads by < 1%. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>	2020-04-28 22:45:39 +00:00
Jason Ekstrand	e63c662c26	anv: Use anv_layout_to_aux_usage for color during render passes Previously, we tried to treat color image layouts as a special case during render passes. This is largely an artifact of history as our initial understanding of Vulkan placed much more emphasis on render passes than our current understanding. The only real practical use for magic layouts in the middle of a render pass, as far as I can tell, is to allow more clear colors to get passed through to input attachments. However, most apps aren't very creative with their clear colors and very few of them (none coming from DXVK) actually use render passes in any interesting way. Therefore, the risk of being able to pass fewer clear colors through to input attachments should be minimal. There are, however, three very big advantages to this change: 1. We are now consistent in our handling of aux usage and layouts between color and depth/stencil. 2. We are now actually following the layout guidelines from the app and aren't nearly as likely to see strange behavior due to us overriding the image layouts manually. 3. It's more obviously correct. While I think our old render pass code was probably correct, it was full of corner cases and it's very possible that it was behaving badly in weird ways. This follows the Vulkan API much more blindly and, as such, is more likely to be correct and behave the same as other implementations. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>	2020-04-28 22:45:39 +00:00
Jason Ekstrand	30016f6e82	anv: Split color_attachment_compute_aux_usage in two In particular, we split out an anv_can_fast_clear_color_view helper which only cares about fast-clear and not aux_usage itself. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>	2020-04-28 22:45:39 +00:00
Jason Ekstrand	3fe45a9b6c	anv: Rework depth_stencil_attachment_compute_aux_usage Instead of making it a function that pretends to choose aux usage (which isn't what it does at all), make it a function which returns whether or not we want to do a fast clear. This is far more accurate to the purpose of the function. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>	2020-04-28 22:45:39 +00:00

1 2 3 4 5 ...

5502 commits