Initially, these just contain the pipeline in a base struct.
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
There are several places where we'd already saved the pipeline off to a
temporary variable but, due to an artifact of history, weren't actually
using that temporary everywhere. No functional change.
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
This splits anv_cmd_state_reset into separate init and finish functions.
This lets us share init code with cmd_buffer_create. This potentially
fixes subtle bugs where we may have missed some bit of state that needs
to get initialized on command buffer creation.
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Meta has been gone for a long time.
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
This is a legacy left-over from the mechanism we used to use to handle
scratch. The new (and better) mechanism doesn't use this.
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
This prevents an assert when running one unreleased Vulkan game.
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Technically, the Vulkan spec requires that we return valid entrypoints
for all core functionality and any available device extensions. This
means that, for gen-specific functions, we need to return a trampoline
which looks at the device and calls the right device function. In 99%
of cases, the loader will do this for us but, aparently, we're supposed
to do it too. It's a tiny increase in binary size for us to carry this
around but really not bad.
Before:
text data bss dec hex filename
3541775 204112 6136 3752023 394057 libvulkan_intel.so
After:
text data bss dec hex filename
3551463 205632 6136 3763231 396c1f libvulkan_intel.so
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The Vulkan spec annoyingly requires us to track what core version and
what all extensions are enabled and only advertise those entrypoints.
Any call to vkGet*ProcAddr for an entrypoint for an extension the client
has not explicitly enabled is supposed to return NULL.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This lets us move a bunch of stuff out of codegen and back into
anv_device.c which is a bit nicer.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This removes some redundant code between libanv_common, libvulkan_intel,
and libvulkan_intel_test.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The new anv_extensions_gen.py is the code generator while the old
anv_extensions.py file is purely declarative.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
vk_error() is a macro that calls __vk_errorf() with instance == NULL.
Then, __vk_errorf() passes a pointer to instance->debug_report_callbacks
to vk_debug_error(), which segfaults as this pointer is invalid but not
NULL.
Fixes: e5b1bd6ab8 "vulkan: move anv VK_EXT_debug_report implementation to common code."
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
From the Vulkan spec with KHX extensions:
"If queries are used while executing a render pass instance that has
multiview enabled, the query uses N consecutive query indices
in the query pool (starting at query) where N is the number of bits
set in the view mask in the subpass the query is used in.
How the numerical results of the query are distributed among the
queries is implementation-dependent. For example, some implementations
may write each view's results to a distinct query, while other
implementations may write the total result to the first query and write
zero to the other queries. However, the sum of the results in all the
queries must accurately reflect the total result of the query summed
over all views. Applications can sum the results from all the queries to
compute the total result."
In our case we only really emit a single query (in the first query index)
that stores the aggregated result for all views, but we still need to manage
availability for all the other query indices involved, even if we don't
actually use them.
This is relevant when clients call vkGetQueryPoolResults and pass all N
queries to retrieve the results. In that scenario, without this patch,
we will never see queries other than the first being available since we
never emit them.
v2: we need the same treatment for timestamp queries.
v3 (Jason):
- Better an if instead of an early return.
- We can't write to this memory in the CPU, we should use
MI_STORE_DATA_IMM and emit_query_availability (Jason).
v4 (Jason):
- No need to take the value to write as parameter, just hard code it to 0.
Fixes test failures in some work-in-progress CTS multiview+query tests.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For also using it in radv. I moved the remaining stubs back to
anv_device.c as they were just trivial.
This does not move the vk_errorf/anv_perf_warn or the object
type macros, as those depend on anv types and logging.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
From Vulkan spec:
"descriptorCount is the number of descriptors contained in the binding,
accessed in a shader as an array. If descriptorCount is zero this
binding entry is reserved and the resource must not be accessed from
any stage via this binding within any pipeline using the set layout."
Fixes:
dEQP-VK.binding_model.descriptor_update.empty_descriptor.uniform_buffer
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
This creates two new internal dependencies, idep_nir_headers and
idep_nir. The former encapsulates the generation of nir_opcodes.h and
nir_builder_opcodes.h and adding src/compiler/nir as an include path.
This ensures that any target that needs nir headers will have the
includes and that the generated headers will be generated before the
target is build. The second, idep_nir, includes the first and
additionally links to libnir.
This is intended to make it easier to avoid race conditions in the build
when using nir, since the number of consumers for libnir and it's
headers are quite high.
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
For things like:
loop
x = func()
list += x
end
just do:
loop
list += func()
end
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Don't use intermediate variables, use consistent whitespace.
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Currently the meosn build has a mix of two styles:
arg : [foo, ...
bar],
and
arg : [
foo, ...,
bar,
]
For consistency let's pick one. I've picked the later style, which I
think is more readable, and is more common in the mesa code base.
v2: - fix commit message
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
After executing a secondary command buffer, we need to update certain
state on the primary command buffer to reflect changes by the secondary.
Otherwise subsequent commands may not have the correct state set.
This fixes various issues (rendering errors, GPU hangs) seen after
executing secondary command buffers in some cases.
v2 (Jason Ekstrand):
- Reset to invalid values instead of pulling from the secondary
- Change the comment to be more descriptive
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
anv_extensions usage from anv_icd was bringing the unwanted dependency
of mako templates for the latter. We don't want that since it will
force the dependency even for distributable tarballs which was not
needed until now.
Jason suggested this approach.
v2: Patch simplification (Jason).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104551
Fixes: 0ab04ba979 ("anv: Use python to generate ICD json files")
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
"The maxDescriptorSet* limit is n times the corresponding
maxPerStageDescriptor* limit, where n is the number of shader stages
supported by the VkPhysicalDevice. If all shader stages are supported,
n = 6 (vertex, tessellation control, tessellation evaluation,
geometry, fragment, compute)."
Fixes:
dEQP-VK.api.info.device.properties
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Apparently, Geminilake requires you to whack a chicken bit to select
either compute or tessellation mode for barriers. The recommendation
is to switch between them at PIPELINE_SELECT time.
We may not need to do this all the time, but I don't know that it hurts
either. PIPELINE_SELECT is already a pretty giant stall.
This appears to fix hangs in tessellation control shaders with barriers
on Geminilake. Note that this requires a corresponding kernel change,
drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake.
in order for the register write to actually happen. Without an updated
kernel, this register write will be noop'd and the fix will not work.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
This was never enabled in secondary buffers because hiz_enabled was
never set to true for those.
If the app provides a framebuffer in the inheritance info when beginning
a secondary buffer, we can determine if HiZ is enabled and therefore
allow the PMA optimization to be enabled within the command buffer.
This improves performance by ~13% on an internal benchmark on Skylake.
v2: Use anv_cmd_buffer_get_depth_stencil_view().
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If we have a color attachment, but its writes are masked, this would
have still returned true. This is inconsistent with how HasWriteableRT
in 3DSTATE_PS_BLEND is set, which does take the mask into account.
This could lead to PixelShaderHasUAV not being set in 3DSTATE_PS_EXTRA
if the fragment shader does use UAVs, meaning the fragment shader may
not be invoked because HasWriteableRT is false. Specifically, this was
seen to occur when the shader also enables early fragment tests: the
fragment shader was not invoked despite passing depth/stencil.
Fix by taking the color write mask into account in this function. This
is consistent with how things are done on i965.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes hangs seen due to the lock not being released here.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Older OpenGL defines two equations for converting from signed-normalized
to floating point data. These are:
f = (2c + 1)/(2^b - 1) (equation 2.2)
f = max{c/2^(b-1) - 1), -1.0} (equation 2.3)
Both OpenGL 4.2+ and OpenGL ES 3.0+ mandate that equation 2.3 is to be
used in all scenarios, and remove equation 2.2. DirectX uses equation
2.3 as well. Intel hardware only supports equation 2.3, so Gen7.5+
systems that use the vertex fetcher hardware to do the conversions
always get formula 2.3.
This can make a big difference for 10-10-10-2 formats - the 2-bit value
can represent 0 with equation 2.3, and cannot with equation 2.2.
Ivybridge and older were using equation 2.2 for OpenGL, and 2.3 for ES.
Now that Ivybridge supports OpenGL 4.2, this is wrong - we need to use
the new rules, at least in core profile. That would leave Gen4-6 doing
something different than all other hardware, which seems...lame.
With context version promotion, applications that requested a pre-4.2
context may get promoted to 4.2, and thus get the new rules. Zero cases
have been reported of this being a problem. However, we've received a
report that following the old rules breaks expectations. SuperTuxKart
apparently renders the cars red when following equation 2.2, and works
correctly when following equation 2.3:
https://github.com/supertuxkart/stk-code/issues/2885#issuecomment-353858405
So, this patch deletes the legacy equation 2.2 support entirely, making
all hardware and APIs consistently use the new equation 2.3 rules.
If we ever find an application that truly requires the old formula, then
we'd likely want that application to work on modern hardware, too. We'd
likely restore this support as a driconf option. Until then, drop it.
This commit will regress Piglit's draw-vertices-2101010 test on
pre-Haswell without the corresponding Piglit patch to accept either
formula (commit 35daaa1695ea01eb85bc02f9be9b6ebd1a7113a1):
draw-vertices-2101010: Accept either SNORM conversion formula.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Previously, we were flagging the instruction state buffer for capture
but not surface state or dynamic state. We want those captured too.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Vulkan spec doesn't specify that VK_REMAINING_ARRAY_LAYERS is allowed
in the passed VkClearRect struct.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We can write to the same output but in different components, like
in this example:
layout(location = 0, component = 0) out ivec2 dEQP_FragColor_0;
layout(location = 0, component = 2) out ivec2 dEQP_FragColor_1;
Therefore, they are not two different outputs but only one.
Fixes:
dEQP-VK.glsl.440.linkage.varying.component.frag_out.*
v3:
- Remove FRAG_RESULT_MAX.
- Add const and use sizeof (Ian).
- Do three-pass to set properly the locations of fragment
outputs when having arrays (Jason).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Push constants on Intel hardware are significantly more performant than
pull constants. Since most Vulkan applications don't actively use push
constants on Vulkan or at least don't use it heavily, we're pulling way
more than we should be. By enabling pushing chunks of UBOs we can get
rid of a lot of those pulls.
On my SKL GT4e, this improves the performance of Dota 2 and Talos by
around 2.5% and improves Aztec Ruins by around 2%.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Push constants work in terms of 32-byte chunks so if we want to be able
to push UBOs, every thing needs to be 32-byte aligned. Currently, we
only require 16-byte which is too small.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In order to do this we have to modify push constant set up to handle
ranges. We also have to tweak the way we handle dirty bits a bit so
that we re-push whenever a descriptor set changes.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>