intel_device_info_init_common calculates this for Gfx9+ based on
max_threads_per_psd and slice information. Mark it as zero in the
structures to make clear that the value there isn't useful, and make
it easier to diff binaries for the next commit's refactors.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
The documentation for 3DSTATE_URB_HS has 0 as the minimum number of HS
URB entries for all platforms. See BSpecs 32162, 47137, 56271 for
Gfx6-11, Xe, and Xe2-3, respectively.
This should silence warnings about our device info field not matching
the hwconfig tables.
Notably, nothing in our drivers currently uses this value so it cannot
have a functional impact.
Fixes: 4064b5546b ("intel/dev: reduce warning noise from urb settings")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
This is only needed for original 965G/GM clipper code, which only exists
in the legacy compiler. Send it off to live with the elk.
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
This is used in exactly one place in crocus, which already has a comment
indicating that this code is needed for original Gfx4 hardware. Just
replace that with a verx10 == 40 check.
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
This is used by a single place in ISL only for sanity checking the
decisions it has already made. The knowledge is already all centralized
in ISL these days, so we don't need a device info bit.
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
Add an optimization pass to turn regular SENDs into SEND_GATHERs.
This allows the payload to be "broken" into smaller pieces that
can be further optimized, which _may_ result in
- less register pressure (no need to contiguous space), and
- less instructions (no need to MOV to such space).
For debugging, the INTEL_DEBUG=no-send-gather option skips this
optimization, and reporting how many opportunities were missed.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>
For Xe3+, set preferred SLM and SLM per threadgroup size.
Bspec: 73211
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32872>
This updates entry for 14017823839 which fixes issues on BMG with:
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.1
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32550>
Rather than checking hwconfig items when using them, wait until after
devinfo has been fully initialized. This includes having workarounds
implemented.
We can then check if the hwconfig data and final Mesa initialization
agree. If the match fails, we need to investigate if Mesa or the
hwconfig data is wrong.
This code becomes a no-op when not on a release build.
Fixes: a4c5bfd34c ("intel/dev: Use hwconfig for urb min/max entry values")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12141
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32359>
Real Xe KMD actually returns a uint64, so here changing from uint32
to uint64.
Fixes: 04bdbeec31 ("intel/dev/xe: Fix access to eu_per_dss_mask")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32527>
DRM_XE_TOPO_EU_PER_DSS and DRM_XE_TOPO_SIMD16_EU_PER_DSS can be any
number of bytes long but it was assuming it was always 4 bytes long.
That was not a issue because Xe KMD return 4 bytes even if only needs
1 or 2 bytes but that is a problem with our HW simulator that was
returning 2 bytes.
Fixes: a24d93aa89 ("intel/dev: Query and compute hardware topology for Xe")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32307>
This commit allows you to dump different regions of memory related to
bvh building. An additional script to decode the memory dump is also
added, and you're able to view the built bvh in 3D view in html. See the
included README.md for usage.
Rework:
- you can now view the actual child_coord in internalNode in html
- change exponent to be int8_t in the interpreter
- fix the actual coordinates using an updated formula
- now you can have 3D view of the bvh
- blockIncr could be 2 and vk_aabb should be first
- Now, if any bvh dump is enabled, we will zero out tlas, to prevent gpu
hang caused by incorrect tlas traversal
- rootNodeOffset is back to the beginning
- Add INTEL_DEBUG=bvh_no_build.
- Fix type of dump_size
- add assertion for a 4B alignment
- when clearing out bvh, only clear out everything after
(header+bvh_offset)
- TODO: instead of dumping on destory, track in the command buffer
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>
No functional change, just move i915 specific data to a struct
and check for kmd_type where appropriate. This will make the
next patch (which adds Xe KMD support here) cleaner.
This patch had to make intel_kmd.h header C++ friendly so it
can use its symbols.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32194>
Rather than updating intel_device_info_update_l3_banks(), the Xe KMD
provides this info via the DRM_XE_DEVICE_QUERY_GT_TOPOLOGY query item.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31894>
intel_device_info_update_after_hwconfig() updates max_cs_threads
based on max_eus_per_subslice and num_thread_per_eu but in some
platforms simulator the hwconfig don't have the
INTEL_HWCONFIG_MAX_NUM_EU_PER_DSS value, causing max_cs_threads to
be set to a wrong value and then causing issues when programing
CFE_STATE with a invalid value.
Fortunately we can also get max_eus_per_subslice from topology query,
so here moving the hwconfig query and
intel_device_info_update_after_hwconfig() call to after topology.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31850>
Although we don't want to rely on hwconfig for devinfo->verx10 == 120,
due to the dependence on closed source software, we do check to see if
hwconfig reports different values in the DEVINFO_HWCONFIG macro.
Matt was seeing this warning on 8086:a7a0:
> MESA: warning: INTEL_HWCONFIG_TOTAL_PS_THREADS (128) != devinfo->max_threads_per_psd (64)
Reported-by: Matt Turner <mattst88@gmail.com>
Fixes: 3e4f73b3a0 ("intel/dev: Update hwconfig => max_threads_per_psd for Xe2")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31077>
Before the patch, intel_device_info_get_max_preferred_slm_size()
returns values in kilobytes, but then
intel_device_info_get_max_slm_size() is multiplying it by 1024.
As a result, LNL is reporting maxComputeSharedMemorySize to be
134217728, which is 128mb.
Fix this by making intel_device_info_get_max_slm_size() not multiply
it by 1024.
This should fix at least the following dEQP tests:
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.1
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.128
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.16
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.2
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.4
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.64
Some tests were failing with:
deqp-vk: ../../src/intel/common/intel_compute_slm.c:24: slm_encode_lookup: Assertion `kbytes <= table[table_len - 1].size_in_kb' failed.
while other tests were triggering the OOM.
v2:
- Make everybody return sizes in bytes (José).
v3:
- Rename variable to bytes (José, Jordan).
Fixes: fd368f5521 ("anv: Set maxComputeSharedMemorySize value for Xe2 platforms")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30541>
Xe KMD will now report the different topology mask types based on the
type of the EU of running platform.
With this we don't need to divide the EU count by 2 in intel_perf.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30127>
Gfx 12.5 moved scratch to a surface and SURFTYPE_SCRATCH has this pitch
restriction:
RENDER_SURFACE_STATE::Surface Pitch
For surfaces of type SURFTYPE_SCRATCH, valid range of pitch is:
[63,262143] -> [64B, 256KB]
The pitch of the surface is the scratch size per thread and the surface
should be large enough to accommodate every physical thread.
So here adding a new field to intel_device_info, setting it in
intel_device_info_init_common() so even offline tools can have it set.
And finally adding a check to fail shader compilation if needed
scratch is larger than supported.
This issue can be reproduced in debug builds when running
dEQP-VK.protected_memory.stack.stacksize_1024 on Gfx 12.5 or newer
platforms.
Ref: BSpec 43862 (r52666)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30271>