Previously some KeplerA chips failed various dEQP tests when instruction
scheduling was enabled.
In particular, `memory_model.message_passing` had issues where a
`membar` instruction canceled some in-flight predicate writes, and
`barrier.write_image_tess_control_read_image_compute.image_128_r32_uint`
had issues around the `Cont` instruction.
This patch refines instruction scheduling to better match the output of
nvcc. Fixing the various dEQP failing tests.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13528
Fixes: c35990c4bc ("nak: Add real instruction dependencies for Kepler")
Signed-off-by: Lorenzo Rossi <snowycoder@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36393>
Optimizations cutting down on GPRs often lead to the akward situations
where RA being more restricted and having to insert more mov instructions
pumping up the instruction counts.
In order to give developers more reliable stats we just set the max_gprs
to the next multiple of 8 including taking hw reserved registers into
account.
This does not impact occupancy in any way despite the increase in gprs.
Totals:
CodeSize: 920980864 -> 914748784 (-0.68%); split: -0.69%, +0.02%
Number of GPRs: 3544248 -> 3879749 (+9.47%)
Static cycle count: 217345431 -> 216414194 (-0.43%); split: -0.50%, +0.07%
Totals from 78493 (89.58% of 87622) affected shaders:
CodeSize: 795883088 -> 789651008 (-0.78%); split: -0.80%, +0.02%
Number of GPRs: 3108571 -> 3444072 (+10.79%)
Static cycle count: 187450578 -> 186519341 (-0.50%); split: -0.58%, +0.08%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36514>
The warning is:
../../src/util/dbghelp.h:900:10: warning: the current #pragma pack alignment value is modified in the included file [-Wpragma-pack]
900 | #include <pshpack4.h>
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36708>
For example Xe2 uses the LSC and doesn´t need the shifting, so let's
just apply it where it's needed.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36757>
So that the driver can allocate an array of relocations using
BRW_SHADER_RELOC_EMBEDDED_SAMPLER_HANDLE + number_of_embedded_samplers
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36757>
Ray tracing pipelines can contain unlimited number of shaders unlike
compute/graphics ones. Having the driver finding the maximum
scratch/ray-query/stack usage can be time consumming when this can be
stored on the pipeline and the runtime tell the driver at bind time.
These fields are unused for other shaders and so drivers can ignore
them.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36757>
The warning is:
dxcapi.h:694:1: warning: adding 'int' to a string does not append to the string [-Wstring-plus-int]
694 | CROSS_PLATFORM_UUIDOF(IDxcVersionInfo2, "fb6904c4-42f0-4b62-9c46-983af7da7c83")
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../src/microsoft/compiler/dxcapi.h:114:59: note: expanded from macro 'CROSS_PLATFORM_UUIDOF'
114 | byte_from_hexstr(spec + 32), byte_from_hexstr(spec + 34))
Note: spec is a string literal: "fb6904c4-42f0-4b62-9c46-983af7da7c83"
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36707>
v2: rework to use Rust closure for worker job function
v3: split preparatory restructuring into separate commit
v4: parallelize link and compile, adjust thread/job count
v5: split out naming changes to later commit, move validation to api/
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36497>
v2: rework to use Rust closures for passed-in job function
v3: drop mutability requirement on queue for adding a job
v4: prevent external creation of fences, return from add_job_sync()
v5: add CPU count utility function based on util_get_cpu_caps()
v6: use &CStr for queue name for convenience
v7: make fence Send + Sync and don't require mutability for waiting
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36497>
This allows for a later refactor to share linking code between
clBuildProgram and clLinkProgram in which the device build is borrowed
mutably.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36497>
This makes it easier to verify if the host allocation a user-ptr bo is
assigned to still exists. The kernel rejects command submissions with
user-ptr bos pointing to non-mapped host memory, so this makes it easier
to debug those.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36701>
We need to release user_ptr resources earlier, so we don't keep stale
references around, but for that to happen we also need to know which
resource is a user_ptr one in the first place.
Cc: mesa-stable
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36701>
Rusticl running on zink might end up creating an 1D image from a host_ptr
and zink might bind it with VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT.
That ended up hitting an assert inside anv_device_map_bo.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36701>
What drivers state programming in the 3D pipeline is the mesh shader.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36734>
We can optimize the VUE layout in cases where all shaders are compiled
together and some outputs are unused. So we need to have consistent
clip/cull_distance_mask with the VUE.
Previously we could have a VUE without ClipDistance present in the
header and yet have a non zero clip_distance_mask. This would trip the
HW into taking into account a VUE field that doesn't exist.
Here we set the clip/cull_distance_mask to 0 if the associated output
is not written by the shader. The written outputs are always
consistent with what's in the VUE.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2d396f6085 ("intel: prepare VUE layout for more than 2 layouts")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13685
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36734>
For unaligned invocations, don't launch two COMPUTE_WALKER, instead we
can mask off excessive invocations in the shader itself at nir level and
launch one additional workgroup.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36245>
Drivers that doesn't support direct unaligned dispatches, they can use
the shader creation flag to lower unaligned dispatches.
Suggested-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36245>
Took inspiration from RADV driver changes. This allow us to get rid of
our local helper get_pipeline_spv().
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36245>