This will happen automatically when they're waited on by the dummy
submit in wsi_common_queue_present().
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037>
If the executables are still hanging out,
anv_GetPipelineExecutableStatisticsKHR will try to dereference NULL
pointers in pipeline->shaders[MESA_SHADER_FRAGMENT].
At least in terms of fossil-db output, this matches the behavior from
before 73b3efcd59.
Fixes: 73b3efcd59 ("anv: Handle the null FS optimization after compiling shaders")
Closes: #6590
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16898>
We were using both uint32_t and anv_cmd_dirty_mask_t, this is
a cleanup making type usage consistent. Commit also changes type of
the mask to be enum anv_cmd_dirty_bits.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16849>
Fixes tests matching:
dEQP-VK.pipeline.extended_dynamic_state.cmd_buffer_start.*unused_ms
These tests bind mesh pipeline, immediately after that bind non-mesh
pipeline and expect that binding mesh pipeline was a no-op.
v2: do it in one place & add comment (Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16811>
It looks like atomics are slow on compressed surfaces so when enabling
compression for storage images that can be possibly used for atomic
operation hinders performance. Lets just disable compression in this
scenario.
v2: Reword comment (Ken)
Allow mutable with 16/32/64 bits (Ken)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14712>
Now that the resulting xfb_info is stashed on the shader, we can put
this with all the other NIR stuff and only fetch it out at the last
minute when we upload the kernel.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16750>
Those shaders are just like the blorp ones.
v2: Use a single internal cache for blorp/RT (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 7f1e82306c ("anv: Switch to the new common pipeline cache")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16741>
For cases with lots of very small primitives, this may improve
performance because we're not executing those dead channels all the
time.
Shader-db reports no instruction or cycle-count changes. However, by
hacking up the driver to report when this optimization triggers, it
appears to affect about 10% of shader-db.
v2 (Kenneth Graunke): Always enable VMask prior to XeHP for now,
because using VMask on those platforms allows us to perform the
eliminate_find_live_channel() optimization. However, XeHP doesn't
seem to have packed fragment shader dispatch, so we lose that
optimization regardless, and there's no reason not to avoid vmask.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1054>
It turns out that we need a fragment shader for streamout. Whh? From
Lionel's reading of simulator sources, it seems the streamout unit is
looking at enabled next stages. It'll generate output to the clipper in
the following cases :
- 3DSTATE_STREAMOUT::ForceRendering = ON
- PS enabled
- Stencil test enabled
- depth test enabled
- depth write enabled
- some other depth/hiz clear condition
Forcing rendering without a PS seems like a recipe for hangs so it's
probably better to just enable the PS in this case.
Fixes: 36ee2fd61c ("anv: Implement the basic form of VK_EXT_transform_feedback")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16506>
Actually compile and cache the no-op fragment shader but remove it from
the pipeline if we determine it's a no-op. This way we always have it
even if it's not strictly needed.
Fixes: 36ee2fd61c ("anv: Implement the basic form of VK_EXT_transform_feedback")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16506>
Starting with Ivy Bridge, we implement alpha-to-coverage by writting
gl_SampleMask with a pattern based on alpha. This will show up in
wm_prog_data::uses_omask so we don't need to look at the key.
Fixes: 36ee2fd61c ("anv: Implement the basic form of VK_EXT_transform_feedback")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16506>
The game Batman: Arkham Knight expects OpenGL behavior
with sample mask and multisampling which is different
from the Vulkan one.
This workaround fix changes key->ignore_sample_mask_out
value that is used for
prog_data->uses_omask definition in brv_fs.cpp(9740)
In that way prog_data->uses_omask also changes it value
and the cloak stops flickering.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6078
Signed-off-by: Viktoriia Palianytsia <v.palianytsia@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16551>
Production Tigerlake and DG1 hardware shouldn't need this workaround.
It was only needed on the very first steppings which never went public.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16575>
On Gfx12.5 products, we'll need to capture a couple of A counters that
are not captured in MI_RPC reports. Those are actually global,
previously all A counters were per context.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16144>
In the future we'll pull more information off devinfo.
v2: Constify pointers (Ian)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16144>
We already had a little workaround for v3dv where, for some if its meta
ops, it had to bind a depth/stenicil image as color. Instead of
special-casing binding depth/stencil as color, let's flip on the
drier_internal flag and get rid of most of the checks in that case.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16376>
With this option enabled range of input values for fsin and fcos is
limited to [-2*pi : 2*pi] by calculating the reminder after 2*pi modulo
division. This helps to improve calculation precision for large input
arguments on Intel.
-v2: Add limit_trig_input_range option to prog_key to update shader
cache (Lionel)
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16388>
We no longer emit STATE_BASE_ADDRESS in every batch on XeHP, so the
decoder might not know what the various base addresses are if it's only
looking at a single batch. Fortunately, they also never change, so we
can just emit them once here.
On earlier platforms, initializing them here should be harmless. We'll
emit STATE_BASE_ADDRESS if we change them, which will update these.
Thanks to Iván Briano for catching this.
Fixes: 8831cb38aa ("anv: Stop updating STATE_BASE_ADDRESS on XeHP")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16287>
If the secondary command buffer executed used push constants on a
different set of stages than the primary is using, we may end up not
reallocating them for the primary, getting misrender artifacts at best,
or a nice GPU hang at worst.
Fixes the tests from a CTS from the future:
dEQP-VK.dynamic_rendering.random.*
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16439>
We use the non-strict algorithm (with parallelograms) prior to Gen10 for
wide lines. We can not advertise rectangularLines.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Fixes: f6e7de41d7 ("anv: Implement VK_EXT_line_rasterization")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15432>
This is now unnecessary. Either an instruction is never dynamic and
it's emitted in genX_pipeline.c or it can be and it's emitted in
genX_cmd_buffer.c/gfx8_cmd_buffer/gfx7_cmd_buffer.c
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16220>
On Gfx7 we can only give the sample location for a given multisample
number. This means everytime the multisampling value changes, we have
to re-emit the locations. It's fine because it's also where
(3DSTATE_MULTISAMPLE) the number of samples is stored.
On Gfx8+ though, 3DSTATE_MULTISAMPLE only holds the number of samples
and all the sample locations for all number of samples are located in
3DSTATE_SAMPLE_PATTERN. So to be more effecient there, we need to
track the locations for all sample numbers and compare new values with
the relevant sample count when touching the dynamic state.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16220>
We don't know in what state the secondary buffer will leave the HW
when it ends. It's easier to consider everything needs to be reemitted
for now.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16220>
device->l3_config is only valid on Gfx11+
This only fixes using GPU_TRACE=1
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 02a4d622ed ("anv: expose a couple of emit helper to build utrace buffer copies")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16291>
Discrete platforms don't have LLC, but on those, we mmap our buffers
with WC. So we shouldn't need to clflush there.
Anv already had a boolean field on the physical device to know whether
we need to use clflush(), based off the memory heaps available. So use
that instead.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15780>