We will be using the image layout. Store the full struct directly from
the user.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Due to recent commits, the sampler now bypasses the auxiliary HiZ buffer
when reading from a depth image subresource that is in the general
layout. Remove this unneeded resolve.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Validate the inputs, verify that this image has a depth
buffer, use gen_device_info instead of
v2:
- Add parenthesis (Jason Ekstrand)
- Make parameters const
- Use gen_device_info instead of gen
- Pass aspect to missed function in transition_depth_buffer
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
According to the PRM description of the Depth field:
"This field specifies the total number of levels for a volume texture
or the number of array elements allowed to be accessed starting at the
Minimum Array Element for arrayed surfaces"
However, ISL defines array_len as the length of the range
[base_array_layer, base_array_layer + array_len], so it already represents
a value relative to the base array layer like the hardware expects.
v2: Depth is defined as a U11-1 field, so subtract 1 from
the actual value (Jason)
This fixes a number of new CTS tests that would crash otherwise:
dEQP-VK.pipeline.render_to_image.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This just enables basic MSAA compression (no fast clears) for all
multisampled surfaces. This improves the framerate of the Sascha
"multisampling" demo by 76% on my Sky Lake laptop. Running Talos on
medium settings with 8x MSAA, this improves the framerate in the
benchmark by 80%.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This allows the helper to check for llc instead of having to do it
manually at all the call sites.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It's a bit shorter and easier to work with. Also, we're about to add a
helper called clflush which does the clflush but without any memory
fencing.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This fixes a some rendering corruption in The Talos Principle
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
This helps Dota 2 on Broadwell by 8-9%. I also hacked up the driver and
used the Sascha "shadowmapping" demo to get some results. Setting
uses_kill to true dropped the framerate on the demo by 25-30%. Enabling
the PMA fix brought it back up to around 90% of the original framerate.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This allows shaders to write to storage images declared with unknown
format if they are decorated with NonReadable ("writeonly" in GLSL).
Previously an image view would always use a lowered format for its
surface state, however when a shader declares a write-only image, we
should use the real format. Since we don't know at view creation time
whether it will be used with only write-only images in shaders, create
two surface states using both the original format and the lowered
format. When emitting the binding table, choose between the states
based on whether the image is declared write-only in the shader.
Tested on both Sascha Willems' computeshader sample (with the original
shaders and ones modified to declare images writeonly and omit their
format qualifiers) and on our own shaders for which we need support
for this.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Compressing a render target and decompressing it in the same
single-subpass render pass may waste bandwidth. While this may be
beneficial in some circumstances, it does not help in all. Reclaims
about 1.95% FPS for Dota 2 on some configurations.
v2 (Jason Ekstrand):
- Provide a more thorough comment
- Enable CCS_D for input attachments
v3 (Jason Ekstrand):
- Provide performance numbers
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The term "lossless compression" could potentially mean multisample
color compression, single-sample color compression or HiZ because they
are all lossless. The term CCS_E, however, has a very precise meaning;
in ISL and is only used to refer to single-sample color compression.
It's also much shorter which is nice.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
v2: use define for buffer ID (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: use define for buffer ID (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For RGB formats in Vulkan, we use the corresponding RGBA format with a
swizzle of RGB1. While this swizzle is exactly what we want for
texturing, it's not allowed for rendering according to the docs. While
we haven't been getting hangs or anything, we should probably obey the
docs. This commit just sanitizes all render swizzles so that the alpha
channel maps to ALPHA.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
It is not clear from the docs exactly how pipelined STATE_BASE_ADDRESS
actually is. We know from experimentation that we need to flush the
render cache prior to emitting STATE_BASE_ADDRESS and invalidate the
texture cache afterwards. The only thing the PRM says is that, on gen8+
we're supposed to invalidate the state cache after STATE_BASE_ADDRESS
but experimentation has indicated that doing so does nothing whatsoever.
Since we don't really know, let's do just a bit more flushing in the
hopes that this won't be a problem again. In particular:
1) Do a CS stall before we emit STATE_BASE_ADDRESS since we don't
really know whether or not it's pipelined.
2) Do a data cache flush in case what runs before STATE_BASE_ADDRESS
is a compute shader.
3) Invalidate the state and constant caches after STATE_BASE_ADDRESS
because the state may be getting cached there (we don't really know).
Reported-by: Mark Janes <mark.a.janes@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
We had no good reason for *not* doing this on gen7 before but we didn't
know it was needed. Recently, when trying update to Vulkan CTS version
1.0.2 in our CI system, Mark discovered GPU hangs on Haswell that appear
to be STATE_BASE_ADDRESS related. This commit fixes them.
Reported-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Commit 2852efcda4 moved the location of
the depth input attachment surface state from the render pass to the
image view, but failed to update the surface state location used when
emitting the binding table. Fix this by loading the surface state from
the correct location.
Fixes:
dEQP-VK.renderpass.formats.d16_unorm.input.*
dEQP-VK.renderpass.formats.d24_unorm_s8_uint.input.*
dEQP-VK.renderpass.formats.d32_sfloat.input.*
dEQP-VK.renderpass.formats.x8_d24_unorm_pack32.input.*
dEQP-VK.renderpass.attachment_allocation.input_output.93
dEQP-VK.renderpass.attachment_allocation.input_output.92
dEQP-VK.renderpass.attachment_allocation.input_output.82
dEQP-VK.renderpass.attachment_allocation.input_output.46
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
This is a better mapping to the Vulkan API and improves performance in
all tested workloads.
v2: Remove unnecessary image view aspect checks (Jason Ekstrand)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Store the current and requested depth stencil layouts so that we can
perform the appropriate HiZ resolves for a given transition while
recording a render pass.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We'll be using layout transitions later on in the series which can occur
within and between subpasses. Turn this on now to simplify the change
later.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We're about to enable HiZ support for multiple subpasses. Use this field
to keep track of whether or not subpass operations should treat the
depth buffer as having an auxiliary HiZ buffer.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The helper doesn't provide additional functionality over the current
infrastructure.
v2: Add comment to anv_image::aux_usage (Jason Ekstrand)
v3: Clarify comment for aux_usage (Jason Ekstrand)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Match the comment above the field by using units of pixels and not HiZ
blocks.
Cc: mesa-stable@lists.freedesktop.org
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Enable multiple layers of the depth/stencil buffers to be accessible.
Fixes the crucible test, func.depthstencil.arrayed_clear.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When there are no framebuffer attachments, fb->width and fb->height will
be 0. Subtracting 1 results in 4294967295 which is too large for the
field, causing genxml assertions when trying to create the packet.
In this case, we can just program it to 1.
Caught by dEQP-VK.tessellation.tesscoord.triangles_equal_spacing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In an attempt to fix 3DSTATE_DEPTH_BUFFER for stencil-only cases, I
accidentally kept setting the SurfaceType to 2D in the stencil-only case
thanks to a copy+paste error.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The 1-D special case doesn't actually apply to depth or HiZ. I discovered
this while converting BLORP over to genxml and ISL. The reason is that the
1-D special case only applies to the new Sky Lake 1-D layout which is only
used for LINEAR 1-D images. For tiled 1-D images, such as depth buffers,
the old gen4 2-D layout is used and the QPitch should be in rows.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Otherwise, some pipe flushes may just never happen. This is unlikely to
cause problems depending on how the kernel schedules batches, but we
shouldn't count on it.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This commit adds the last remaining bits to support input attachments in
the Intel Vulkan driver. For color and depth attachments, we allocate an
input attachment surface state during vkCmdBeginRenderPass like we do for
the render target surface states. This is so that we can incorporate the
clear color and aux information as used in rendering. For stencil, we just
treat it like a regular texture because we don't there is no aux. Also,
only having to worry about at most one input attachment surface for each
attachment makes some of the vkCmdBeginRenderPass code simpler.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We were using VK_IMAGE_ACCESS_COLOR_ATTACHMENT_READ_BIT to detect an input
attachment read. We should use VK_IMAGE_ACCESS_INPUT_ATTACHMENT_READ_BIT
instead.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The != VK_SUCCESS case is really only capable of handling the one error.
This assert makes things a bit safer if something else goes wrong.
Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>