si_sqtt_start / si_sqtt_stop use emit_barrier which clears barriers_flags.
Since these functions are used to build an auxiliary cs which will only
be emitted later (on sqtt enablement/disablement) it shouldn't clear
the global barrier_flags value.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39308>
The pattern:
ctx->barrier_flags |= ...;
si_mark_atom_dirty(sctx, &sctx->atoms.s.barrier);
is used a lot, let's add an inline helper. This prevents
forgetting the call to si_mark_atom_dirty.
si_upload_bindless_descriptors is special because we're
already in the emit phase so we shouldn't dirty barrier
again.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39308>
Some newer gen8 devices (like a840/kaanapali, but not x2-85 which is
otherwise similar) flip the hw shading rate value around to match
vulkan/gl instead of DX.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39167>
Narrowing integer conversions on SALU with GPR src do not behave as one
would expect on gen8, so avoid them. This does not apply to uGPR srcs
or float conversions.
See, for example:
dEQP-VK.glsl.builtin.function.integer.bitcount.int_highp_compute
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39167>
Otherwise the first reg in the array ends up with offset=0, which
signals the end of parsing the magic_raw regs. Which is not the
desired outcome :-)
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39167>
The field name needs to match between variants. In this case, the
register is the same, just with different offset. So use a bitset.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39167>
Now that we have intrinsics which map directly to the hardware opcodes,
we can lower PLS inside the gallium driver instead of the back-end
compiler having to know anything about it. This simplifies the back-end
and is less code, if you ignore the new copyright header.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39367>
This is a little more manual (though it's actually less code) but it
gives us a lot more control and makes the whole flow nicer.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39367>
Like we just did with load_tile_pan, this maps directly to ST_TILE in
the hardware. This is more versatile and lets us do more of our
lowering in NIR.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39367>
Instead of making it explicitly about outputs, this switchies it to
being a NIR version of LD_TILE. It means we have to do a bit of work in
NIR and add a builder helper but the end result is something much more
versatile.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39367>
Previously common wsi has a special submission to install implicit fence
to wsi memory directly, which has been deprecated in favor of bonfire
implicit fencing (implicit fencing has been turned into explicit fencing
within vulkan since then). The virtgpu backend is fine but the vtest
backend has been regressed since then, only relying on renderer side hw
driver doing implicit fencing.
With async present landed earlier, we can directly tell which submission
is done by common wsi, and can revive the idle waiting accordingly.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39377>
Gfx 12.5 struct has only one major difference with gfx9, that is OaCntr lenght,
while on gfx 9 it is 36 uint64_t long on gfx 12.5 it is 38 uint64_t long.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lukasz Stalmirski <lukasz.stalmirski@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32842>
We are missing handling for gfx12.5 so to add it we will need a switch case over
verx.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lukasz Stalmirski <lukasz.stalmirski@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32842>
Looking at the reference code, there is no new struct for Xe3 so it should
use the same struct as Xe2.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lukasz Stalmirski <lukasz.stalmirski@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32842>
With no more users of intel_perf_load_configuration() it can be
removed with other i915 functions around it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lukasz Stalmirski <lukasz.stalmirski@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32842>
We have no usage of the information returned by
intel_perf_load_configuration(). It is only used to add a copy of the
configuration so we have the metric id but we could instead get the
metric id from sysfs, that is added by mdapi.
Xe KMD don't have a uAPI to query the metrics configuration, so
using sysfs also fixes the integration of mdapi with Xe KMD.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lukasz Stalmirski <lukasz.stalmirski@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32842>