BTD unit will keep accumulating the threads and then eventually dispatch
those active threads once it reaches the counter.
I guess dispatching too fast will not have full occupancy at the BTD
unit, instead we just pick the half of max value for counter.
This patch also add drirc option to dispatch_timeout_counter and tweak
values internally with respect to HW limits. Default value we have right
now is 512 clocks, we can for sure tune it per app.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40733>
This can be used to workaround problem cases with application
controlled subgroup size.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40813>
../src/intel/vulkan/genX_init_state.c: In function ‘gfx9_CreateSampler’:
../src/intel/vulkan/genX_init_state.c:1507:40: warning: ‘border_color_offset’ may be used uninitialized [-Wmaybe-uninitialized]
1507 | sampler_state.BorderColorPointer = border_color_offset;
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41116>
With SPV_KHR_constant_data, it's allowed to specialize array of
constants.
RustiCL changes are from Karol Herbst <kherbst@redhat.com>.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41046>
This patch enables dynamic stack ID control on Xe3+.
Programmed values are the recommended settings from the Bspec.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41066>
This will make this function more generic allowing us to use it for
COMPUTE_WALKER_2.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41035>
CBV resources are supposed to be 256B aligned
(D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT).
vkd3d-proton will puts CBV addresses in the push constant data and do
global loads on them. Unfortunately those loads don't have a 256B
alignment value on them. So when looking at what we can promote to HW
push buffers, we can't consider them.
This change introduces a detection pass for CBV resources (according
to vkd3d-proton devs those are 64KiB in size) and realign the loads to
be 256B aligned.
This is only enabled on DX emulation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39451>
StackSizePerRay is the RTDispatchGlobals::AsyncStackSize and
DisableRTGlobalsKnownValues is to interpret how many Max BVH levels we
need to use. It's not relevant to Vulkan, since we have just 2 fixed BVH
levels.
Fixes: cb423ee6 ("anv: Fix Wa_14021821874, Wa_14018813551, Wa_14026600921")
Fixes: c1a44e8d ("anv: force StackIDControl value for Wa_14021821874")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41012>
Since e94cb92cb0 ("anv: use internal surface state on Gfx12.5+ to
access descriptor buffers") we're only using the 32bit_index_offset
address format for loads from descriptor memory.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41047>
I ran into this case where genX(cmd_buffer_emit_bt_pool_base_address)
was returning immediately because it considered an RCS engine
emulating a compute queue as neither a render nor a compute queue.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41047>
Probably worked because we could always reach to things through the
binding table and the index was the same.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41047>
For some reason the android builders started noticing...
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41047>
For simple copy, we can copy data with uvec4(16bytes) at once.
When we have serialize/deserialize copy mode, we want to copy out the
instance leave address which are 8byte wide, so we need to jump with
8byte stride instead of 16bytes.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40966>
We need to use these registers on another file and I don't want to add
another copy of their definition to our code base.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40937>
genX_cmd_compute.c has 2 places that is had a code very similar to
anv_shader_get_scratch_surf() but we could not make use of this function without
change it parameters.
Now it takes the shader stage and the total_scratch instead of anv_shader because
cmd_buffer_trace_rays() don't have a shader.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40832>
We will need to call get_scratch_surf() from other files, so here removing the
static and adding it to anv_private.h.
No changes in behavior expected here.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40832>
When per-primitive padding is needed, max_push_buffers is set to 3
(instead of 4) to reserve the last slot for it.
The assert was requiring `n_push_ranges < max_push_buffers`, which
incorrectly fired when the 3 ranges were used.
Fixes: a8ba682919 ("anv: assert we haven't gone over the maximum number of push_buffers")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15155
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40803>
This could override data allocated by the application when shader code
is loaded from binary in vkCreateShaderObjectEXT().
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d39e443ef8 ("anv: add infrastructure for common vk_pipeline")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40727>
Thanks to Konstantin for pointing out that we really don't need atomics
here. We can use the IR offset to get the slot and keep stuffing the
instance address in it. Header already writes the instance count for us.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40734>
We have 4 image intrinsic variants now. This enum is useful for
nir_rewrite_image_intrinsic() and it will be used by other NIR passes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40709>
The only possible values are:
- VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER
- VK_DESCRIPTOR_TYPE_STORAGE_BUFFER
- VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_KHR
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40670>
The current implmentation adjust the mmap() parameters to make it work, but that
causes us to map more addresses than application asked what could cause us to
overwrite other application mmaps().
So here we export the slab parent as a dma-buf, then do the mmap with almost no
adjustment, the only change is the offset that needs to include the difference
between bo address and slab bo parent address.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40441>