Dual source blending when one of the sources is not written to leaves
those values undefined, but the other should still be valid.
By omitting unwritten outputs, we ended up not writing anything at all
for the case that OUT1 is written to but OUT0 is undefined.
Fixes new CTS tests: dEQP-VK.pipeline.*.blend.dual_source.undefined_output.first*
Cc: mesa-stable
Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40357>
When there is no trace pointer, there is usually a another tracepoint
being emitted (see STATE_BASE_ADDRESS,
3DSTATE_BINDING_TABLE_POOL_ALLOC emission).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40503>
In commit 10b5b279a4 ("anv: Fix CmdResetEvent2() with RESOURCE_BARRIER::Wait stage == none")
I haved added assert to catch invalid cases but looks like we have several tests
affected by that problem causing crashes in debug builds.
So here I'm removing those asserts(), will then work on all the fixes and bring
it back.
Acked-by: Ivan Briano <ivan.briano@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40476>
On integrated platforms, we have issue where L3 cache not being coherent
with CS and it forces us to push data out L3.
To avoid data cache flush, let's write the IR header with BLORP shader.
There is a small shader launch latency but eventually that should not
matter because writing data with CS (MI_STORE) commands is slower than
shader execution when we consider large number of BVH tree getting
built.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39971>
Add dedicated BLORP op enums so clear paths can be represented
precisely.
This is enum-only groundwork; behavior and trace output are wired in
follow-up commits.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40414>
The runtime builds a final pipeline state with pointers to structures
coming from the associated pipelines libraries.
So far it has considered that the viewMask was part of a structure
together with the rest of the renderpass information. This information
can be specified in pre-raster, fragment & color-output state groups
and it was assumed would be consistent for all 3. And the runtime
currently takes the pointer to the structure from the last pipeline
library (color output).
Some coming spec/cts will clarify that the viewMask only needs to be
specified for pre-raster & fragment groups, making the value in the
color-output group untrustworthy.
This change creates a new state structure to hold the viewMask on its
own so it is only gather on pre-raster & fragment groups.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (radv)
Reviewed-by: Aitor Camacho <aitor@lunarg.com> (kosmickrisp)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (turnip)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v3dv)
Reviewed-by: Frank Binns <frank.binns@imgtec.com> (powervr)
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com> (panvk)
Royaled-yes-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> (lavapipe)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39940>
CmdResetEvent2() was calling anv_add_pending_pipe_bits() with no dst_stages
stages causing RESOURCE_BARRIER::Wait stage == none, what causes a GPU hang in
NVL-P simulator.
So here setting dst_stages to VK_PIPELINE_STAGE_2_TOP_OF_PIPE_BIT and adding
an assert in resource_barrier_wait_stage() to catch hw_stage == 0.
This fixes crucible func.event.cmd_buffer.q0 in simulator.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40445>
RADV wants to abstract the compiler from any instance/device/pdev
objects.
The previous NULL check for instance seems to be useless.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40379>
Those values trace back to 2015, pre Vulkan 1.0 release. I have no
idea why it was set to this, except maybe the HALIGN_128 of
RENDER_SURFACE_STATE.
Anyway, discussing this with Nanley, we don't think 128bytes is more
optimal than 64bytes. Nanley suggested the lowest value could be
16bytes for the fixed functions inside the GPU (sampler, dataport),
but a cacheline probably makes more sense for the memory interface.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40363>
They all do exactly the same thing, except that GS multiplies by an
extra factor, and TCS has urb_read_length == 0 so it skips one line.
No need for four copies.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>
Just return the register instead of having multiple functions stash the
register in an array of registers. Way too much hoopla here.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>
This was used for Gfx4-5. Since then, we're just passing around a
boolean that nobody wants. Even if someone did, a better plan is to
just check nir->info directly.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>
When running "./deqp-vk -n dEQP-VK.memory.binding.maintenance6*", we
get tons of:
MESA-INTEL: debug: anv_bind_image_memory: ignored VkStructureType
VK_STRUCTURE_TYPE_BIND_MEMORY_STATUS(1000545002)
The function does not ignore VK_STRUCTURE_TYPE_BIND_MEMORY_STATUS: it
looks for it before the main pNext loop. The pNext loop we have there
calls vk_debug_ignored_stype(), which complains about the fact that
we, allegedly, ignore VK_STRUCTURE_TYPE_BIND_MEMORY_STATUS. Move the
code where we find bind_status to the loop so it doesn't complain
anymore.
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40351>
Fixes new CTS tests.
Similar to a previous change : 5bf3546cc6 ("anv: Use companion cmd
buffer for CCS and MCS image barriers")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40332>
When copying data between two surfaces, independently increase the size
of each surface's format (bits-per-pixel) as alignment constraints
allow. Adjust the other surface parameters and blorp_copy() parameters
accordingly.
This fixes copies between the 16bpp YCRCB formats and 32bpp formats:
dEQP-VK.ycbcr.single_plane_copy.linear.linear.r8g8b8a8_to_g8b8g8r8_422
This new test failure was reported by Iván Briano.
More generally, this increases the efficiency of our copies. As shown in
the configuration pages of the PRMs, our sampler is able to fetch texels
at a fixed rate of texels / clock regardless of the texel size
(presumably our rendering hardware has similar behavior). By using the
largest texel size possible, we can transfer more bits / clock.
Improves the performance of a number of traces in the performance CI for
BMG:
* TotalWarWarhammer3 +2.24%
* Payday3 +1.87%
* BaldursGate3 +1.34%
* Control +1.25%
* TotalWarPharaoh +1.22%
Four additional traces are helped between +0.44% and +0.96%.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39974>
Don't use it without ISL_AUX_USAGE_STC_CCS. With a future patch, this
will allow blorp_copy() calls to increase the size of the surface format
for CPB.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39974>
blorp_copy() will soon start changing the format in a way which drivers
cannot rely on to do things like manage the texture cache (see iris).
Narrow down the scope of blorp_copy_get_formats() and
blorp_copy_get_color_format() such that the returned value can only be
trusted if compression would be enabled on each image.
By doing this (and adapting iris to reflect this), we'll get the
required flushes on the platforms which need
WaSamplerCacheFlushBetweenRedescribedSurfaceReads:
* On the platforms which need the workaround for all formats,
blorp_copy() will stick with the queried format on compressed
surfaces.
* On the platforms which need the workaround when switching from ASTC
and non-ASTC formats, blorp_copy() may actually change the queried
format on compressed surfaces. This is not a problem, because
surfaces which may be read with ASTC formats are not compressible.
Prevents gfx9 from failing tests under:
* KHR-GL46.copy_image.functional_src_target_texture_2d_array_src_format_r3_g3_b2*
* KHR-GL46.copy_image.functional_src_target_texture_2d_array_src_format_rgb5*
* KHR-GL46.copy_image.functional_src_target_texture_2d_array_src_format_rgba2*
* KHR-GL46.copy_image.functional_src_target_texture_2d_array_src_format_rgba4*
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39974>
With upcoming blorp_copy() changes, this avoids the following failures
with zink on gfx9:
* dEQP-GLES3.functional.texture.specification.basic_teximage3d.r8_2d_array
* dEQP-GLES3.functional.texture.specification.basic_teximage3d.r8_snorm_2d_array
* dEQP-GLES3.functional.texture.specification.basic_teximage3d.r8i_2d_array
* dEQP-GLES3.functional.texture.specification.basic_teximage3d.r8ui_2d_array
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39974>
blorp_copy() will soon gain the ability to increase the format bpb.
Prepare anv by replicating the clear color pixel on gfx12.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39974>
We're going to start changing the surface format during blorp_copy().
Changing the surface format could lead to incorrect image alignment
parameters, so return a fixed halign and valign for images with a single
subresource. That's all that will be needed for the upcoming
blorp_copy() changes.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39974>
Aux-tt alignment only applies to the beginning of the resource. Drop it
if we're pointing to an image that is not in the first tile of the
image. Likewise for the alignment we add for sequential multi-engine
access.
We allow sparse on 1D images. When getting an image from such a surface,
the alignment likely won't be aligned to 64KB. So, in this case, remove
the flag to avoid the alignment expectation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39974>
Increase the scope of Yf/Ys miptail workarounds to drop the dependency
on format type (compressed or uncompressed) and make this information
more publically accessible. If I recall correctly, the affected tests
only performed blorp_copy() uploads and downloads and never accessed
images with compressed formats. So, we likely should be increasing the
scope.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39974>
Fixes the following test case on ICL:
$ INTEL_DEBUG=noccs ./deqp-vk -n
dEQP-VK.api.image_clearing.core.clear_color_image.3d.optimal.
single_layer.r32g32b32a32_uint
Fixes: 78e24605db ("intel/isl: Reduce scope of Yf-disabling workaround")
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39974>