The pixld.covmask instruction returns the coverage mask for the entire
pixel being shaded, not the set of samples covered by the current FS
invocation as required by GL/Vulkan. In order to get the GL/Vulkan
behavior, we have to mask off samples not covered by the current FS
invocation.
Previously, we handled this by masking by 1 << gl_SampleID. This
required us to force full sample shading whenever gl_SampleMaskIn[] was
used. Otherwise, we didn't know what to mask. Instead, this commit
switches us to using an array in CB0 which has a sample mask for each
sample, representing the set of samples in that sample's pass. Masking
by this allows us to get the full range of variability provided by
NVIDIA's multi-pass MSAA hardware. It also allows us to eliminate the
workaround that forced per-sample shading for gl_SampleMaskIn[] because
we can adjust the masks from the API side as needed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29194>
Dota 2 and Counter-Strike 2 really want to be able to allocate memory
for both VkImages and VkBuffers from the same memory type. Xe2's
special compression-only memory type does not support buffers, which
makes these games crash. Disable CCS on these games as a workaround.
This is a temporary workaround as we're still working towards a
long-term solution (either by fixing the engine or finding a way
better expose our memory types).
Backport-to: 24.2
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11520
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11521
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30481>
These memory types are useless when CCS is disabled, don't leave them
there so they don't confuse applications.
Backport-to: 24.2
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30481>
Fixes the following test, as far as you enable Vulkan 1.3 (if not it
is skipped):
dEQP-VK.api.info.vulkan1p3_limits_validation.max_inline_uniform_total_size
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29476>
Xe KMD will now report the different topology mask types based on the
type of the EU of running platform.
With this we don't need to divide the EU count by 2 in intel_perf.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30127>
Sync xe_drm.h with f2881dfdaaa9 ("drm/xe/oa/uapi: Make bit masks unsigned").
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30127>
Before resorting to scheduling instructions based on max_delay only,
postsched would prioritize instructions that have no hard delay (i.e.,
delays for which nops should be inserted) but might still have soft
delays (for which ss/sy needs to be inserted). Removing this has a
slight negative effect on nops but improves sstall/systall. This seems
to improve actual render pass time.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30437>
max_delay is a measure for the maximum delay from a node to the end of a
block. However, the current calculation would include the delay of the
node itself (which is the maximum delay from the node's sources). At the
point a node is scheduled, this source delay might already be much
smaller (or even zero) because its sources have hopefully been scheduled
much earlier. This means the max_delay value would be an overestimate of
the actual delay from the node to the block's end.
This commit fixes this and makes sure max_delay is always the maximum
delay from a node to the end of the block from the point where the node
is scheduled.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30437>
Since v_interp_p2_f32 with constant operands only happens on GFX11.5, this
should actually be fine in all cases where this is currently possible
(GFX11.5+ allows DPP with scalar src1). However, it does fail validation
because we haven't updated that yet.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: bee487df48 ("aco/gfx11.5+: use vinterp for fddx/fddy")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30477>
`CI_NODE_INDEX` is only defined in `parallel:` jobs.
Without this, we end up with `--fraction-start --fraction 1`, which is
obviously invalid but somehow it hasn't blown up (yet).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30462>
Some of my colleagues have scripts using CSV format for measuring
frame timing.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29944>
Allows a driver to declare indirect arguments for its tracepoints and
pass an address. u_trace will request a copy of the data which should
be implemented on the command processor.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Co-Authored-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29944>
We want to reduce the buffer allocations for other type of data than
timestamps.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29944>
We're about to add indirect arguments, having a better way to describe
arguments (as capture/storage) will be useful.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29944>
De-duplicate the code a bit and prepare for using this in clear_buffer function.
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Signed-off-by: David Heidelberg <david@ixit.cz>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30284>
Our validation code doesn't need to know which bytes are accessed. It
only needs to know which grfs were accessed by an element. This also
helps to easily handle the Xe2 register size change.
Backport-to: 24.2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28479>
Previously this check would create a mask of the bytes used in the
grf, and then shift the mask. This worked well when there was 32 bytes
in the register because a 64-bit uint64_t could easily detect that
bytes were used in the next regiter. (The next register was the high
32-bits of the `access_mask` variable.)
With Xe2, the register size becomes 64 bytes, meaning this strategy
doesn't work. Instead of a mask, we can just check to see if more than
1 grfs are used during each loop iteration. (Suggested by Ken.) This
will make it easier to extend for Xe2 in a follow on commit.
Verified this with
dEQP-VK.subgroups.arithmetic.compute.subgroupexclusivemul_u64vec4_requiredsubgroupsize
on Xe2, which otherwise would cause the program to fail to validate
because it assumed a grf was 32 bytes.
Backport-to: 24.2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28479>