fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 20:10:14 +01:00

Author	SHA1	Message	Date
Jason Ekstrand	1b5cb92b62	anv: Apply cache flushes after setting index/draw VBs Reviewed-by: Ivan Briano <ivan.briano@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-05 10:59:10 -06:00
Jason Ekstrand	7ce39a55c1	anv: Always invalidate the VF cache in BeginCommandBuffer I think the reason why we only do this for primaries is that we didn't expect to have blorp calls in secondaries. However, you are allowed to have a full render pass in a secondary command buffer so resolves and clears can end up in there. We should just always invalidate. Reviewed-by: Ivan Briano <ivan.briano@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-05 10:59:10 -06:00
Rafael Antognolli	d3e339364f	anv: Use 3DSTATE_CONSTANT_ALL when possible. Use this new instruction introduced in Gen12. The instruction itself is smaller, and it also allows us to emit a single instruction to all stages that have the same push constant buffers (e.g. when they don't have constant buffers). There's one restriction to use this instruction, though: the length field is only 5 bits long, so we need to check whether we can use it, and fallback to the old 3DSTATE_CONSTANT_XS if that field is >= 32. v2: - Rebased on top of the lasted changes from Jason. - Added review suggestions by Caio. - Removed struct push_bos and merged some code into anv_nir_compute_push_layout(). v3: - Remove code churn due to gen8+ workaround in anv_nir_compute_push_layout(). This code has been removed in an earlier commit, and implemented in cmd_buffer_emit_push_constant(). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-12-04 20:48:25 +00:00
Rafael Antognolli	7d5da53d27	anv: Move code for emitting push constants into its own function. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-12-04 20:48:25 +00:00
Rafael Antognolli	67d2cb3e93	anv: Add get_push_range_address() helper. Add a helper function to get the push range address. Once we have a separate function for emitting gen12 push constants, we can use this helper and avoid duplicating code. v3: Do not add range->start to the address in gen7 (Caio). v4: Do not drop range->start from gen7 (Caio, Jason). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-12-04 20:48:25 +00:00
Rafael Antognolli	c0225a728e	anv: Move gen8+ push constant packet workaround. Store push_ranges in ascending order, and only "shift" them to the end of the array during state packet emission. We don't need this workaround with the new 3DSTATE_CONSTANT_ALL packet. So instead of applying the workaround here just for GEN < 12 (which requires and extra loop through all the ranges to figure out if we should shift them or not), we simply move the whole logic to the state emission code. At that point, in a later commit, we are already looping through all of the ranges anyway to check which packet we will be using, so we might as well implement the workaround there, where it is going to be used. v3: Move gen8+ workaround to the state emission code (Caio). v4: Add explanation of why we moved the workaroudn (Caio). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-12-04 20:48:25 +00:00
Jason Ekstrand	178a2946c0	anv: Respect the always_flush_cache driconf option Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-03 17:10:51 -06:00
Jason Ekstrand	a8965c076b	anv: Push constants are relative to dynamic state on IVB Fixes: `aecde2351` "anv: Pre-compute push ranges for graphics pipelines" Closes: #2136 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-26 22:15:54 +00:00
Rafael Antognolli	dadb6ebbd1	intel: Add workaround for stencil state. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-11-19 21:43:09 +00:00
Jason Ekstrand	fdaf8144a8	anv: Emit a NULL vertex for zero base_vertex/instance If both are zero (the common case), we can emit a null vertex buffer rather than emitting a vertex buffer with zeros in it. The packing of the VERTEX_BUFFER_STATE is faster because no relocation is emitted and we can avoid creating the vertex buffer which means one less anv_state_stream_alloc. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	98dc179c1e	anv: More carefully dirty state in BindPipeline Instead of blindly dirtying descriptors and push constants the moment we see a pipeline change, check to see if it actually changes the bind layout or push constant layout. This doubles the runtime performance of one CPU-limited example running with the Dawn WebGPU implementation when running on my laptop. NOTE: This effectively reverts `beca63c6c0`. While it was a nice optimization, it was based on prog_data and we can't do that anymore once we start allowing the same binding table to be used with multiple different pipelines. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	22f16ff54a	anv: More carefully dirty state in BindDescriptorSets Instead of dirtying all graphics or all compute based on binding point, we're now much more careful. We first check to see if the actual descriptor set changed and then only dirty the stages used by that descriptor set. For dynamic offsets, we keep a bitfield per-stage of which offsets are actually used in that stage and we only dirty push constants and descriptors if that stage has dynamic offsets AND those offsets actually change. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	ca8117b5d5	anv: Use a switch statement for binding table setup It theoretically could be more efficient but the real point here is that it's no longer really a matter of dealing with special cases and then the "real" thing. The way we're handling binding tables, it's more of a multi-step process and a switch is more natural. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	ca91ab8015	anv: Re-arrange push constant data a bit This moves the compute stuff into a anv_push_constants::cs sub-struct. It also moves dynamic offsets into the push constants. This means we have to duplicate the data per-stage but that doesn't seem like the end of the world and one day we may wish to make dynamic offsets per-stage anyway. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	aecde23519	anv: Pre-compute push ranges for graphics pipelines It turns off that emitting push constants is one of the hottest paths in the driver and ANY work we do there costs us. By pre-computing things a bit ahead of time, we shave 5% off the runtime of a CPU-limited example running with the Dawn WebGPU implementation. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	4b392ced2d	anv: Stop bounds-checking pushed UBOs The bounds checking is actually less safe than just pushing the data. If the bounds checking actually ever kicks in and it's not on the last UBO push range, then the shrinking will cause all subsequent ranges to be pushed to the wrong place in the GRF. One of the behaviors we definitely don't want is for OOB UBO access to result in completely unrelated UBOs returning garbage values. It's safer to just push the UBOs as-requested. If we're really concerned about robustness, we can emit shader code to do bounds checking which should be stupid cheap (a CMP followed by SEL). Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	ebad00d9e7	anv: Delete dead shader constant pushing code As of `2d78e55a8c`, nir_intrinsic_load_constant with a constant offset is constant-folded so we should never end up with any that trigger brw_nir_analyze_ubo_ranges. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	0709c0f6b4	anv: Flatten descriptor bindings in anv_nir_apply_pipeline_layout This lets us stop tracking the pipeline layout. It also means less indirection on a very hot path. As an extra bonus, we can make some of our data structures smaller. No measurable CPU overhead improvement. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Jason Ekstrand	fa120cb31c	anv: Input attachments are always single-plane Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Rafael Antognolli	d4f628235e	anv: Use mocs settings from isl_dev. v2: Remove device->default_mocs and external_mocs (Jason). Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-12 20:41:52 +00:00
Lionel Landwerlin	c1c346f166	anv: implement VK_KHR_separate_depth_stencil_layouts v2: Use ternary to simplify code (Jason) v3: Reorder switch cases to follow existing section ordering (Nanley) Add missing comment in cmd_buffer_end_subpass() about new layout (Nanley) v4: Fix layout comparison for stencil case (Nanley) Update a few more comments (Nanley) Move VK_IMAGE_LAYOUT_STENCIL_ATTACHMENT_OPTIMAL_KHR in color attachment case for future stencil-CCS support (Nanley) v5: Missed comments update (Nanley) Updated relnotes.txt (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2019-11-06 20:13:30 +00:00
Jason Ekstrand	f60ef0fff4	anv: Move the RT BTI flush workaround to begin_subpass Now that we're no longer compacting binding table entries, the only time they can possibly change is when we actually switch subpasses. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-10-31 21:07:15 +00:00
Jason Ekstrand	853d3b59fd	anv: Allocate misc BOs from the cache Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-10-31 13:46:09 +00:00
Jason Ekstrand	0a6d2593b8	anv: Allocate descriptor buffers from the BO cache Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-10-31 13:46:09 +00:00
Jason Ekstrand	c4be72934e	anv: Fix a relocation race condition Previously, we would read the offset from the BO in anv_reloc_list_add to generate the presumed offset and then again in the caller to compute the 64-bit address to write into the buffer. However, if the offset somehow changed between these two points, the presumed offset would no longer match the written offset. This is unlikely to actually ever be a problem in practice because the presumed offset gets recorded first and so if the written address is wrong then the presumed offset is almost certainly wrong and the relocation will trigger. However, it's much safer to simply have anv_reloc_list_add return the 64-bit address. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-10-31 13:46:08 +00:00
Rafael Antognolli	3c317e8187	anv: Add Tile Cache Flush for Unified Cache.	2019-10-30 19:51:03 +00:00
Jason Ekstrand	beca63c6c0	anv: Avoid emitting UBO surface states that won't be used This shaves around 4-5% off of a CPU-limited example running with the Dawn WebGPU implementation. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-30 16:05:57 +00:00
Plamena Manolova	4fe2317601	anv: Implement new way for setting streamout buffers. For gen12 we set the streamout buffers using 4 separate commands instead of 3DSTATE_SO_BUFFER. Signed-off-by: Plamena Manolova <plamena.manolova@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-10-29 19:21:20 +00:00
Nanley Chery	6451008e8b	intel: Refactor blorp_can_hiz_clear_depth() Prepare this function to be used in iris and to handle new Gen12 behavior. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-28 10:47:06 -07:00
Nanley Chery	300d77c2fa	anv/cmd_buffer: Don't assume CCS_E includes CCS_D There's no longer a clear-only compression mode of CCS on Gen12+. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-28 10:47:05 -07:00
Jordan Justen	7737f56544	anv/gen12: Write GFX_AUX_TABLE base address register Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-10-28 00:09:14 -07:00
Lionel Landwerlin	2b5f30b1d9	anv: implement VK_INTEL_performance_query v2: Introduce the appropriate pipe controls Properly deal with changes in metric sets (using execbuf parameter) Record marker at query end v3: Fill out PerfCntr1&2 v4: Introduce vkUninitializePerformanceApiINTEL v5: Use new execbuf extension mechanism v6: Fix comments in genX_query.c (Rafael) Use PIPE_CONTROL workarounds (Rafael) Refactor on the last kernel series update (Lionel) v7: Only I915_PERF_IOCTL_CONFIG when perf stream is already opened (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-10-23 05:41:15 +00:00
Jordan Justen	9790cfcefa	anv,iris: L3ALLOC register replaces L3CNTLREG for gen12 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-09-06 13:11:25 -07:00
Francisco Jerez	c2fe7a0fb8	anv/gen9: Optimize slice and subslice load balancing behavior. See "i965/gen9: Optimize slice and subslice load balancing behavior." for the rationale. According to Jason, improves Aztec Ruins performance by 2.7%. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1) v2: Undo CPU performance micro-optimization done in i965 and iris due to lack of data justifying it on anv. Use cmd_buffer_apply_pipe_flushes wrapper instead of emitting pipe control command directly. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-12 14:40:21 -07:00
Lionel Landwerlin	cefb4341b7	anv: drop unused code We stopped using this when we moved to Jason's mi_builder. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-09 17:01:38 +03:00
Jason Ekstrand	bc612536eb	anv: Emit a dummy MEDIA_VFE_STATE before switching from GPGPU to 3D There is an object-level preemption workaround which requires this. However, even without object-level preemption, we seem to have issues with geometry flickering when 3D and compute are combined in the same batch and this appears to fix it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109630 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111267 Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-06 05:46:28 +00:00
Eric Engestrom	e775b938b2	intel: drop incorrect MAYBE_UNUSED All these are actually always used. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Sagar Ghuge	806e5a37ed	anv: Implement VK_KHR_imageless_framebuffer v2: Pass pointer instead of struct instance (Lionel) v3: 1) Fix small nits (Jason) 2) Add way to detect anv_framebuffer don't have attachments (Jason) 3) Get rid of unncessary pNext chain walk (Jason) 4) Keep framebuffer instance in anv_cmd_state (Jason) v4: 1) Dump attachments from cmd_buffer (Jason) v5: 1) Fix condition check and add assertion (Lionel) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-23 10:01:45 -07:00
Jason Ekstrand	6a2ff217b8	anv: Set Stateless Data Port Access MOCS This is the MOCS setting used for the A64 stateless messages which we sometimes use for SSBO operations. Fixes: `48ed2a7bb0` "anv: Implement VK_EXT_buffer_device_address" Fixes: `79fb0d27f3` "anv: Implement SSBOs bindings with GPU addr..." Reviewed-by: Chad Versace <chadversary@chromium.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-10 19:35:23 +00:00
Jason Ekstrand	9672b7044c	anv: Set STATE_BASE_ADDRESS upper bounds on gen7 This should fix floating-point border color on all gen7 HW. Integer is still thoroughly busted on gen7 because it doesn't exist on IVB and it's crazy on HSW. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-17 18:53:07 -05:00
Jason Ekstrand	f3ea0cf828	anv: Add stencil texturing support for gen7 Intel hardware didn't get support for sampling from W-tiled (required for stencil) images until Broadwell so we can't directly sample from stencil. Instead, if we want to support stencil texturing on gen7 hardware, we have to keep a texture-capable shadow copy around and use BLORP to update when stencil changes. The one thing this commit does not implement is self-dependencies with stencil input attachments. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99493 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-17 22:32:26 +00:00
Jason Ekstrand	2b736d9e6c	anv/cmd_buffer: Add a stencil transition helper Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-17 22:32:26 +00:00
Jason Ekstrand	86fc268142	anv/blorp: Take an aspect in anv_image_copy_to_shadow Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-17 22:32:26 +00:00
Ville Syrjälä	6230bfeb65	anv/cmd_buffer: Reuse gen8 Cmd{Set, Reset}Event on gen7 Modern DXVK requires event support [1], but looks like it only uses vkCmdSetEvent() + vkGetEventStatus(). So we can just borrow the relevant code from gen8, leaving CmdWaitEvents still unimplemented. [1] `8c3900c533` v2: Also move CmdWaitEvents into genX_cmd_buffer.c (Jason) Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-11 16:25:07 -05:00
Nanley Chery	b4198e792c	anv/cmd_buffer: Initalize the clear color struct for CNL+ On CNL+, the clear color struct is composed of RGBA channel values and fields which are either reserved by the HW or used to control fast-clears. Currently anv initializes the channel values to zero and allows the other fields to be undefined. Satisfy the MBZ field requirements by removing an optimization that doesn't hold true for CNL+ and pulling in the number of dwords to initialize from ISL. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-07 18:43:06 +00:00
Caio Marcelo de Oliveira Filho	f7d53fffa2	anv: Remove special allocation for anv_push_constants The key reason for that mechanism is gone: all the extra optional data that could be in the anv_push_constants was moved elsewhere. At this point, just put anv_push_constants directly in anv_cmd_state (part of anv_cmd_buffer). v2: Remove a NULL check we don't need anymore in anv_cmd_buffer_push_constants(). (Lionel) Fix size we consider for valid push params. (Lionel) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-09 19:01:14 -07:00
Jason Ekstrand	e6803f6b6f	anv: Use bindless textures and samplers This commit changes anv to put bindless handles and sampler pointers into the descriptor buffer and use those instead of bindful when we run out of binding table space. This "spilling" of descriptors allows to to advertise an almost unbounded number of images and samplers. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	3b755b52e8	anv: Put image params in the descriptor set buffer on gen8 and earlier This is really where they belong; not push constants. The one downside here is that we can't push them anymore for compute shaders. However, that's a general problem and we should figure out how to push descriptor sets for compute shaders. This lets us bump MAX_IMAGES to 64 on BDW and earlier platforms because we no longer have to worry about push constant overhead limits. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	83b943cc2f	anv: Make all VkDeviceMemory BOs resident permanently We spend a lot of time in the driver adding things to hash sets to track residency. The reality is that a properly built Vulkan app uses large memory objects and sub-allocates from them. In a typical frame, most of if not all of those allocations are going to be resident for the entire frame so we're really not saving ourselves much by tracking fine-grained residency. Just throwing everything in the validation list does make it a little bit more expensive inside the kernel to walk the list and ensure that all our VA is in order. However, without relocations, the overhead of that is pretty small. If we ever do run into a memory pressure situation where the fine- grained residency could even potentially help, we would likely be swapping one page out to make room for another within the draw call and performance is totally lost at that point. We're better off swapping out other apps and just letting ours run a whole frame. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	0d6dea0ac8	anv/cmd_buffer: Use gen_mi_sub instead of gen_mi_add with a negative Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00

... 14 15 16 17 18 ...

1117 commits