The command streamer will blindly prefetch up to 4KiB ahead of a batch buffer
depending on the engine. To avoid page faults with the scratch page disabled,
we can create a special VMA heap for batch buffers that has pages initialized
with the null tile bit by default.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
Both anv_buffer_view->vk.range and VkDescriptorAddressInfoEXT->range
are VkDeviceSize, which is uint64_t. In Anv, we pass this to
align_down_npot_u32(), anv_fill_buffer_surface_state() and
anv_fill_buffer_view_surface_state(), all which convert it down to
uint32_t. Then we call isl_buffer_fill_state(), converting the value
back to uin64_t as size_B.
Remove the intermediate u32 truncation everywhere. If some place does
not accept values bigger than UINT_MAX, it is that place that should
have a check. We shouldn't silently convert a u64 value to u32 and
then back to u64.
I'm not aware or any workloads that are affected by this bug today.
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41251>
Since inline parameter is the last field of the thread payload, the
backend can always assume they may exist. They won't affect the
position of other payload fields and the register allocator will
reuse any unused space.
In Anv, also update EmitInlineParameter for Task/Mesh/CS to reflect
previous changes in inline parameter setup. Remove/Update some stale
comments since we are here.
Finally, remove the prog_key/prog_data bits that tracked whether inline
data or a push address was needed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41230>
BTD unit will keep accumulating the threads and then eventually dispatch
those active threads once it reaches the counter.
I guess dispatching too fast will not have full occupancy at the BTD
unit, instead we just pick the half of max value for counter.
This patch also add drirc option to dispatch_timeout_counter and tweak
values internally with respect to HW limits. Default value we have right
now is 512 clocks, we can for sure tune it per app.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40733>
This can be used to workaround problem cases with application
controlled subgroup size.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40813>
../src/intel/vulkan/genX_init_state.c: In function ‘gfx9_CreateSampler’:
../src/intel/vulkan/genX_init_state.c:1507:40: warning: ‘border_color_offset’ may be used uninitialized [-Wmaybe-uninitialized]
1507 | sampler_state.BorderColorPointer = border_color_offset;
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41116>
With SPV_KHR_constant_data, it's allowed to specialize array of
constants.
RustiCL changes are from Karol Herbst <kherbst@redhat.com>.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41046>
This patch enables dynamic stack ID control on Xe3+.
Programmed values are the recommended settings from the Bspec.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41066>
This will make this function more generic allowing us to use it for
COMPUTE_WALKER_2.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41035>
CBV resources are supposed to be 256B aligned
(D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT).
vkd3d-proton will puts CBV addresses in the push constant data and do
global loads on them. Unfortunately those loads don't have a 256B
alignment value on them. So when looking at what we can promote to HW
push buffers, we can't consider them.
This change introduces a detection pass for CBV resources (according
to vkd3d-proton devs those are 64KiB in size) and realign the loads to
be 256B aligned.
This is only enabled on DX emulation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39451>
StackSizePerRay is the RTDispatchGlobals::AsyncStackSize and
DisableRTGlobalsKnownValues is to interpret how many Max BVH levels we
need to use. It's not relevant to Vulkan, since we have just 2 fixed BVH
levels.
Fixes: cb423ee6 ("anv: Fix Wa_14021821874, Wa_14018813551, Wa_14026600921")
Fixes: c1a44e8d ("anv: force StackIDControl value for Wa_14021821874")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41012>
Since e94cb92cb0 ("anv: use internal surface state on Gfx12.5+ to
access descriptor buffers") we're only using the 32bit_index_offset
address format for loads from descriptor memory.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41047>
I ran into this case where genX(cmd_buffer_emit_bt_pool_base_address)
was returning immediately because it considered an RCS engine
emulating a compute queue as neither a render nor a compute queue.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41047>
Probably worked because we could always reach to things through the
binding table and the index was the same.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41047>
For some reason the android builders started noticing...
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41047>
For simple copy, we can copy data with uvec4(16bytes) at once.
When we have serialize/deserialize copy mode, we want to copy out the
instance leave address which are 8byte wide, so we need to jump with
8byte stride instead of 16bytes.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40966>
We need to use these registers on another file and I don't want to add
another copy of their definition to our code base.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40937>