Add a Foldable trait similar to what is already used in NAK for software
emulation of opcodes, since Mali has many variations like V4I8 that run
the same exact operation independently on each component of the vector,
this commit also adds a FoldableComp trait that lets the implementor
only focus on a single component and automatically implements Foldable.
We also add tests on OpShiftLop as an initial subject, we'll add most of
the arithmetic opcodes as time goes on to have a tight description of
the hardware.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42189>
Add the generic infrastructure to load/store the test data and compile
the shader, along simple tests that use the hw_runner.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42189>
This is a very small driver that just sends compute jobs to the graphics
card without any of the Vulkan or OpenGL indirections. For now it only
supports v10-v13 since it's what Kraid is targeting. Lots of the
low-level code that handles CSF encoding and descriptor handling is in C
foir semplicity (and because there is no genxml equivalent for rust yet).
device.rs also implements a barebone memory-safe Rust abstraction for
mali GPUs, as a treat.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42189>
We'll need the extra ensurance if we want to share the model across
threads.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42189>
The compiler will also implement a very small driver that depends
on genxml and libpanfrost, so it needs to be defined after them, but
before clc.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42189>
Previously libpanfrost depended on the panfrost compiler, that was just
used for the pan_disassemble function used to disassemble and print
shaders. We'll need to add a dependency from kraid tests to libpanfrost
and this made things harder due to meson shenanigans.
This commit splits the dependency between libpanfrost and the compiler by
adding the disassembler as a callback, so that the user can provide its
own disassembler.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42189>
If tests are enabled with the same name as the original crate two entries
are placed in rust-project.json with identical name, rust-analyzer does
not like that, rename tests to "kraid_test" to fix it.
Also, meson rust tests are weird as they directly call rustc --test flag
directly, and rust-project.json does not see any test cfg option.
To have proper code analysis in #[cfg(test)] we need to specify that
option directly in meson (this will mean that rustc will see --test and
--cfg test at the same time, it doesn't seem to mind though)
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42189>
Rust bindgen creates include dependencies that are relative to the
project root, that works perfectly if the build root is inside of the
project root, but breaks when it's a separate directory
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42189>
The gen_opcodes custom target generates gen_opcodes.h,
gen_opcodes_private.h, and gen_opcodes.cpp, but idep_gen_opcodes_h only
declared gen_opcodes.h.
Declare gen_opcodes_private.h as well so that generated-header
dependencies are exported correctly to downstream hermetic build systems.
Fixes ninja-to-soong build failures due to missing gen_opcodes_private.h.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42248>
Felix:
- Fix typo in the end debug marker for update
Thanks to Kevron, He tested couple of workloads on BMG:
- Hitman +50.3%
- F122 +26.8%
- SOTR +18%
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39617>
This commit adds new debug options to dump out parent-child relationship
map using INTEL_DEBUG=bvh_pcrel_map.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39617>
Track where is each leaf_id encoded in final BVH.
It's a map of leaf_id == final_bvh_offset. This will help us to navigate
the BVH layout in update pass.
Leaf block offset will give us : Leaf id -> bvh block
and parent-child map can be used for: bvh_block -> parent offset.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39617>
This map stores parent BVH offset for each of their children. This will
help us to walk the BVH layout later in the update pass.
Since we are tracking block indexes, even with 2^32 large BVH size, we
can have 2^26 max indices (each block 64B wide) that leaves us 6 bits in
which we can track child slot index occupancies in parent.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39617>
Extract leaf encoding in encode.h and move some of the helper in
anv_build_helper.h
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39617>
Previously, we were accounting invalid nodes as well in child block
count which insert holes in the BVH memory.
These holes in the memory would trigger the HW traversal hangs.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40858>
When a base is larger than the supported [min, max] bounds, we were
clamping the base to that range, and adding the rest. This works,
but it leaves us with a bunch of loads/stores with the same maximum
base, and different iadds for addresses. This isn't ideal, because
it means that every access has a different iadd.
Instead, flip it around: now we calculate the largest multiple of
(max + 1) which is less than base, and iadd that. Then the new base
becomes the remaining portion, which is guaranteed to be <= max.
With that, all loads/stores within a maximum-offset window share a
common iadd which can be CSE'd, and use the immediate offset field
for small deltas from there.
Note that this should work for negative offsets beyond the minimum
too; we do calculate a larger negative addition and then flip to
positive immediate offsets.
Cuts 11% of instructions from the first compute shader of
dEQP-VK.ray_query.builtin.rayqueryterminate.comp.aabbs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42237>
These take a base offset that we can plug into the LSC extended
descriptor immediate. This is essentially the same improvement that we
made by switching to the ssbo_intel intrinsics.
eliminates spilling in dEQP-VK.ray_query.builtin.rayqueryterminate.comp.aabbs
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42237>
The Anv driver doesn't ever set opts->softfp64 for the preprocess
stage (anv_shader_preprocess_nir()). The Vulkan preprocess stage is a
"physical device" stage, and softfp64 requires the actual anv_device:
see the comments for the preprocess_nir function pointer inside the
definition of struct vk_device_shader_ops, and the definition of
anv_ensure_fp64_shader().
It is only during anv_shader_compile() that we call
anv_ensure_fp64_shader(), where we actually build and store the
nir_shader we name fp64_nir. Then we have everything ready and we can
call the nir_lower_doubles pass.
To account for all that, just have brw check if opts->softfp64 is
actually set, and disable the full_software lowering if we don't have
it: otherwise we'll either segfault or hit the assert(softfp64) that
is in lower_doubles_instr_to_soft() in nir_lower_double_ops.c.
This prevents a segfault (or an assertion failure when in debug mode)
when running DIRT 5 on Tiger Lake.
Fixes: 7d3b62e13d ("anv: only load fp64 software shader when needed")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42105>
If the descriptor is allowed to be non uniform, we don't have to
force helpers to keep it uniform.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42232>
Before we only did this for loads, but the same logic applied here too.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42232>
We might as well make sure that those backends don't break on
future use. At least jay will probably use this pass.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42232>
A future GPU will have a larger size for the sampler state in GPU, so here
doing the necessary adjustment to support sampler state of any size in run-time.
For now ANV_SAMPLER_STATE_GPU_SIZE is doing a dumb check because without it
compiler will complain that device is not used.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42023>
This issue happens in a couple of places but here main problem:
ANV_SAMPLER_STATE_SIZE is 32 bytes long(no idea why), but SAMPLER_STATE in GPU
is 16 bytes long.
anv_sampler_state::state and anv_sampler_state::state_no_bc has 16 bytes of
storage but in some places we do a mempcy of ANV_SAMPLER_STATE_SIZE bytes, like
in anv_GetDescriptorEXT():
memcpy(pDescriptor, sampler->state.state[0], ANV_SAMPLER_STATE_SIZE);
So lets replace the magic numbers by macros, have CPU data with ANV_SAMPLER_STATE_SIZE
size and only when copying to GPU copy the exacly size that GPU expects with
ANV_SAMPLER_STATE_GPU_SIZE.
Cc: stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42023>
Cleans up the final halt in
dEQP-VK.rasterization.frag_side_effects.color_at_beginning.terminate_invocation
with the terminate lowering.
O(1) for the function so that's pretty good.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42219>
* implement terminate
* fix HALT brokenness on all shader stages (we need a real end block)
* optimize demote codegen a ton
* optimize gl_HelperInvocation/gl_SampleMask
* optimize "all lanes demoted" via HALT.any
* optimize scheduling of stores/atomics/demotes in FS
* optimize some texturing with helper invocations
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42097>