We were already doing this for some packets to keep the lines shorter.
We may as well just do it for all of them.
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Initially, these just contain the pipeline in a base struct.
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
There are several places where we'd already saved the pipeline off to a
temporary variable but, due to an artifact of history, weren't actually
using that temporary everywhere. No functional change.
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
This change introduce the concept of planes for image & views. It
matches the planes available in new formats.
We also refactor depth & stencil support through the usage of planes
for the sake of uniformity. In the backend (genX_cmd_buffer.c) we have
to take some care though with regard to auxilliary surfaces.
Multiplanar color buffers can have multiple auxilliary surfaces but
depth & stencil share the same HiZ one (only store in the depth
plane).
v2: by Jason
Remove unused aspect parameters from anv_blorp.c
Assert when attempting to resolve YUV images
Drop redundant logic for plane offset in make_surface()
Rework anv_foreach_plane_aspect_bit()
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In Vulkan, for 'z' (depth) component, the scale and translate values
for the viewport transformation are:
pz = maxDepth - minDepth
oz = minDepth
zf = pz × zd + oz
Being zd, the third component in vertex's normalized device coordinates.
Fixes: dEQP-VK.draw.inverted_depth_ranges.*
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
The functions we're marking as UNUSED in genX_pipeline.c are used only
when compiling for particular generations.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Unless you have data, the compiler knows better than you whether a
function should be inlined.
No difference in the resulting binary with gcc-6.3.0 or clang-4.0.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Now that the state stream is allocating off of the state pool, there's
no reason why we need the block pool to be separate.
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
We can just use the new CHVLineWidth field rather than an entirely
different generation's packing function.
v2: Inline the function (requested by Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This allows the helper to check for llc instead of having to do it
manually at all the call sites.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It's a bit shorter and easier to work with. Also, we're about to add a
helper called clflush which does the clflush but without any memory
fencing.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Unfortunately, this doesn't substantially improve the performance of any
known apps. With Dota 2 on my Sky Lake gt4, it seems help by somewhere
between 0% and 1% but there's enough noise that it's hard to get a clear
picture.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This helps Dota 2 on Broadwell by 8-9%. I also hacked up the driver and
used the Sascha "shadowmapping" demo to get some results. Setting
uses_kill to true dropped the framerate on the demo by 25-30%. Enabling
the PMA fix brought it back up to around 90% of the original framerate.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Vulkan doesn't have a stencilWriteEnable bit like it does for depth.
Instead, you have a stencil mask. Since the stencil mask is handled as
dynamic state, we have to handle it later during command buffer
construction. This, combined with a later commit, seems to help Dota2
on my Broadwell GT3e desktop by a couple percent because it allows the
hardware to move the depth and stencil writes to early in more cases.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
As per VK_KHR_maintenance1, setting a negative height in the viewport
can be used to get flipped coordinates. This is, aparently, very useful
when porting D3D apps to Vulkan. All we need to do to support this is
to make sure we actually set the min and max correctly.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We'll be using layout transitions later on in the series which can occur
within and between subpasses. Turn this on now to simplify the change
later.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The helper doesn't provide additional functionality over the current
infrastructure.
v2: Add comment to anv_image::aux_usage (Jason Ekstrand)
v3: Clarify comment for aux_usage (Jason Ekstrand)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cleaner this way and we avoid including gen9_pack.h when we compile with
gen8_pack.h. We also avoid the if (cherryview) condition for non-gen8
gens that don't need it.
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The HZ sequence modifies less state than the blorp path and requires
less CPU time to generate the necessary packets.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Move the assignment to a less surprising location.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This behavior differs from what's described in the PRMs and was
observed by analyzing CTS test results.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
With one small genxml change, the two versions were basically identical.
The only differences were one #define for HSW+ and a field that is missing
on Haswell but exists everywhere else.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
If we don't, we can end up with corruption in the portion of the depth
buffer that lies outside the render area when we do a HiZ resolve at the
end. The only reason we weren't seeing this before was that all of the
meta-based clears such as VkCmdClearDepthStencilImage were internally using
HiZ so the HiZ buffer never truly got out-of-sync. If the CTS ever tested
a depth upload (which doesn't care about HiZ) and then a partial render we
would have seen problems. Soon, we will be using blorp to do depth clears
and it won't bother with HiZ so we would get CTS regressions without this.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Nanley Chery:
(rebase)
- Resolve conflicts with new anv_batch_emit macro
(amend)
- Handle a QPitch TODO
- Emit 3DSTATE_HIER_DEPTH_BUFFER on pre-BDW systems
- Only use HiZ for single-subpass renderpasses
- Emit the HiZ instruction before the stencil instruction to follow the
optimized clear sequence specified in the PRMs
- Don't modify clear params
- Enable resolves when a HiZ buffer is used to ensure depth buffer validity
Provides an FPS increase of ~15% on the Sascha triangle and multisampling
demos.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Create a function that performs one of three HiZ operations -
depth/stencil clears, HiZ resolve, and depth resolves.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This is the only remaining part of genX_l3.c and there's really no good
reason for it to be in its own file.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.
We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.
v4:
* Support the old local ID push constant layout as well (Jason)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is in contrast to emitting it directly in vkCmdPipelineBarrier. This
has a couple of advantages. First, it means that no matter how many
vkCmdPipelineBarrier calls the application strings together it gets one or
two PIPE_CONTROLs. Second, it allow us to better track when we need to do
stalls because we can flag when a flush has happened and we need a stall.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Mark variables MAYBE_UNUSED to avoid unused-but-set-variable warnings
in release build.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Users should never provide a scissor or viewport count of 0 because
they are required to set such state in a graphics pipeline. This
behavior was previously only used in Meta, which actually just
disables those hardware operations at pipeline creation time.
Kristian noticed that the current assignment of viewport count
reduces the number of viewport uploads, so it is not removed.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
The programming of the L3 Cache registers should match the previous
manually packed LRI values.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>