The Xe ioctl DRM_XE_DEVICE_QUERY_ENGINE_CYCLES provides accurate
timestamps correlated between the CPU and GPU. However, it is slow and
impacts performance while collecting Perfetto traces.
Instead, use Perfetto's GetBootTimeNs() to track when to emit the
BUILTIN_CLOCK_BOOTTIME clock sync event so it only occurs every 1
second. This reduces the impact of recording gpu.renderstages from
-8% to -4%.
More concretely, FPS measurements when tracing Unity BoatAttack demo on
an Intel ADL device:
* gpu.renderstages disabled: 48.044293667
* gpu.renderstages enabled: 38.119778333 (-20.66%)
* gpu.renderstages enabeled + this fix: 42.641818333 (-11.24%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37095>
Apply any outstanding accumulated PC bits before we proceed on building
Acceleration Structure.
2 reasons for this :
- some of the data accessed by the build might need to be flushed
as a result of a previous barrier
- the scratch buffer might get reused between builds
Cc: mesa-stable
Closes: #13711
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36951>
is_format_supported() doesn't really differentiate between read and
write. So require both.
Fixes a bunch of cl cts tests resulting from garbage (uninitialized)
storage_descriptor.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37089>
This was copied+pasted from the old GL driver but we never use any of
those PTE kinds since they're only for compressed depth pre-Turing.
We've also never provied that this is actually good for anything. Just
delete it for now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37094>
If implementation does not actually replay the VA, it must return 0
to not violate:
"If the memory object was allocated with a non-zero value of
opaqueCaptureAddress, the return value must be the same address."
Fixes RenderDoc capture replay, which asserts on the this spec rule
being followed.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37090>
Add a bunch of shortcut names to select physical devices by their device
type.
In particular this aims to make switching between igpu and dgpu easy as
well as testing with lavapipe.
v2:
- rebase and reformat
- use strncasecmp and VkPhysicalDeviceType
- only print debug message when enabled
Signed-off-by: Benjamin Otte <otte@redhat.com>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> (v2)
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36717>
A BO can be larger than the requested size due to page-size alignment.
Sanitize the IB size derived from the BO at all call sites that create
IBs for CS, so we never exceed the HW limit.
Was found when replaying the following capture:
Sid Meiers Civilization VI_289070 _dx11_World Map_ultra_720p.rdc
Fixes: aa392e1ec2 ("tu: Align BO size to page size")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37082>
Setting the old window coordinate to NaN is more likely to hide the
problem in debug builds because the NaN vertices are dropped later in
the pipeline, either through explicit NaN checks or implicit line
length checks.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Roland Scheidegger <roland.scheidegger@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36653>
Registers that don't need to be pinned to a channel right from the start
can be pinned when the instrcutions writing to them are scheduled.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36921>
Results of TEX and FETCH are pinned to group automatically when
creating instructions from string. With the new scheduling code
the channel pinning might be added and this needs to be handled
when reading the expectation shaders.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36921>
Missing 32-bit entry point in GLSL
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Fixes: 2ce20170 ("mesa: Add support for GL_EXT_shader_clock")
Signed-off-by: Ashley Smith <ashley.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36041>
When using FDM, we need approximately square tiles to maintain
proper density distribution across the framebuffer.
Way to wide or tall tiles would distort the density mapping, causing
areas intended for low density to receive higher density and vice
versa.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37032>
The original lower_phis_to_regs_block() is a little too clever. It
crawls up the predecessor tree until it finds a cross edge and places
the register writes as deep as it can. This breaks nak_nir_lower_cf().
Say you have a shader like...
con %0 = load_uniform()
con loop {
if div {
} else {
}
break;
}
con %1 = phi %0
The original lower_phis_to_regs_block() will turn it into
con %0 = load_uniform()
con %r = decl_reg();
con loop {
if div {
reg_store(%r, %0)
} else {
reg_store(%r, %0)
}
break;
}
con %1 = reg_load(%r)
We then convert it into unstructured control-flow and run regs_to_ssa()
to get our phis back, which lowers each of the registers we inserted to
a phi tree. When we try to recover divergence information on phis by
looking at their sources, this works fine if each source maps directly
to a reg_store() whic maps directly to a phi in the original IR.
However, because the reg_store() instructions are placed deeper, it may
introduce false divergence.
Switch to the simple version of nir_lower_phis_to_regs_block() which
places reg writes directly in phi predecessor blocks. We could probably
be more conservative and just avoid placing writes to uniform regs in
divergent control-flow but it's more robust to make the load/store_reg
intrinsics match the original phis directly.
This fixes some shaders in Horizon: Zero Dawn Remastered
Fixes: b013d54e4f ("nak/lower_cf: Flag phis as convergent when possible")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36914>