We should have never been doing this as bind time. Instead, layout
transitions out of UNDEFINED are in the spec specifically so the
driver has a point where it can do initialization, so do our init there
instead.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41275>
The NDK api __android_log_print has been available since api level 3,
which is preferred since NDK api is more stable.
Acked-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41254>
The NDK api __android_log_write has been available since api level 3,
which is preferred since NDK api is more stable. Meanwhile, use write
instead of print to avoid extra internal copy/truncate involved in the
print helper.
Acked-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41254>
Added a --scratch flag instead of always forcing the scratch page enabled, this
allows the hang replay tool to be used to debug page faults.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
Took inspiration from RADV to make nir_opt_load_store_vectorize robust against
page faults, by checking the align_offset and align_mul to see if any extra
components could be overlapping into a different page.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
To simplify things, our backend rounds convergent block loads up to a full
register. This causes page faults with the scratch page disabled since the
address is not always aligned to a register size. Loading smaller blocks is
slightly more difficult because the SEND instruction can only write back a
multiple of full registers, even if the actual data is smaller.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
We can just conditionally replace the address with an address to a zero
initialized cacheline if the read is going to go out of bounds.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
First, the surface dimensions are used to determine the range of valid
pages that the data in the buffer overlaps, then rows are removed from
the surface until it does not overfetch into any neighboring pages. If
any rows were removed, an extra BTI is set up with a texel buffer that
views the contents of all the rows that were removed, and the shader is
compiled with a branch to sample the last rows through the texel buffer
instead of the main surface.
Using the texel buffer allows it to access the last rows without dealing
with overfetch or weird alignment hacks, and restricting texel buffer
usage to just the part of the surface that can't be accessed safely
ensures that we don't significantly impact performance for any buffer to
image copy that is unlucky enough to be close to a page boundry.
Co-authored-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
Consolidates the logic for calculating the intratile extent of a slice of a
surface to avoid duplicating code in the next patch.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
Adds a function to calculate the total size of a 2D linear sampling engine
surface, including overfetch, for a buffer to image copy.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
Since we now have a ISL_SURF_USAGE_NO_OVERFETCH_PADDING_BIT flag to turn extra
padding calculations on and off, we can align the row pitch of linear surfaces
that are accessed through the sampler to minimize the number of L3 cachelines
that each sampler cacheline overlaps for added efficiency.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
Bspec 58779 describes various cases where additional padding is required on the
bottom and right sides of a sampling engine surface to avoid page faults.
Since we don't want to mess up the other drivers that also use ISL, there's now
a requires_padding boolean in isl_dev that can be used to enable/disable the
extra padding calculations per device and driver.
The extra padding can also be disabled per-surface by adding the usage flag
ISL_SURF_USAGE_NO_OVERFETCH_PADDING_BIT, like when a specific row pitch is
needed.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
When sampling BUFFER, 1D, or 2D surfaces, with no MSAA, no mipmap levels,
linear tiling, and SurfaceArray set to false, the surface padding
requirements are relaxed and its much easier to use the sampler to do
buffer-to-image copies in BLORP. We can't have it like this by default
though because we need SurfaceArray true for robustness.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
Like the command streamer, the EUs will also blindly prefetch up to 3.5KiB
ahead of a shader. We can manage this in the shader heap by adding the
required padding when we allocate the buffers to back a shader allocation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
The command streamer will blindly prefetch up to 4KiB ahead of a batch buffer
depending on the engine. To avoid page faults with the scratch page disabled,
we can create a special VMA heap for batch buffers that has pages initialized
with the null tile bit by default.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
Add a convenience helper function, and use this for checking
whether queries are active. This fixes a bug where we were
basically always using a software fallback for indirect
rendering rather than the CSF hardware loop.
Cc: mesa-stable
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41141>
Ensure the INDEX_OFFSET CSF register is set to 0 before an ordinary
draw. This is normally the case, but indirect draws can change it.
Cc: mesa-stable
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41141>
Maintenance9 will require us to make unbound vertex inputs (that is,
attributes in the VS without a corresponding binding at the same
location) be defined. In order for this to work, VFD_FETCH_INSTR_INSTR
must be defined for all attributes used in the shader. Imagine we do
something like:
CmdSetVertexInputs(only 1 input at location 0)
CmdBindPipeline(VS reads only location 0)
CmdDraw()
CmdBindPipeline(VS reads locations 0 and 1)
CmdDraw()
For the first draw we only need to emit VFD_FETCH_INSTR_INSTR[0], for the
second draw we need to emit VFD_FETCH_INSTR_INSTR[1] as well in the VI
draw state. This unfortunately means we have to do draw-time validation
for vertex input state.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40552>
The kernel already does this for us by zeroing new BOs. It's also
unnecessary, unless the newly-introduced
VK_QUERY_POOL_CREATE_RESET_BIT_KHR flags is used. If we ever start
suballocating query pools, we may have to zero based on that flag, but
for now we don't have to do anything.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40552>
Before we were falling back to always emitting a pipeline barrier, which
effectively kills any point of having the event. But with sync2 and the
guarantee that src/dst dependency infos match, we can instead emit the
flushes before writing the event and actually use the event as intended.
As a bonus, this also allows the BV to run ahead of the BR.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40552>
We used this for two different purposes with different caching
requirements:
- it's always needed for LLVM, and needs to be part of the cache key
- it's needed for disassembly with ACO, and shouldn't be part of the cache
key
Eventually, we'll want the family to only be part of the cache key if LLVM
is used, but still accessable for when ACO needs the disassembler.
If we put it in radv_compiler_info::debug, we'll need to treat that
specially to hash it into the key when LLVM is used.
If we put it in radv_compiler_info::key, that will hash it into the key
unnecessarily if ACO is used and disassembly might be needed.
So just put the family in both, and use debug::family for disassembly and
key::family for LLVM.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41261>
Dynamic rendering codepath allows binding an attachment with a
depth+stencil format, but only depth or stencil active. The
corresponding test should be disable in such case.
Ignore the attachment's depth or stencil according to the rendering
attachment info's is_depth and is_stencil variables.
Fixes the following CTS testcases:
dEQP-VK.pipeline.monolithic.stencil.no_stencil_att.dynamic_rendering.static_enable.d24_unorm_s8_uint
dEQP-VK.pipeline.monolithic.stencil.no_stencil_att.dynamic_rendering.static_enable.d32_sfloat_s8_uint
dEQP-VK.pipeline.monolithic.stencil.no_stencil_att.dynamic_rendering.dynamic_enable.d24_unorm_s8_uint
dEQP-VK.pipeline.monolithic.stencil.no_stencil_att.dynamic_rendering.dynamic_enable.d32_sfloat_s8_uint
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41054>
to lower bindless handles to 32 bits in the GLSL compiler and eliminate
input loads of high 32 bits of bindless handles before nir_opt_varyings.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41170>
to lower bindless handles to 32 bits before nir_opt_varyings, so that
the high 32 bits of (input) loads of bindless handles are eliminated early.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41170>
This just separates tex coord lowering into a new pass.
The gfx_level parameter is now unused in ac_nir_lower_image_tex, but I'm
keeping it because it will be used in the future.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41173>
Mesa's common logging framework supports Android already - switch
to it.
Note: on non-Android platforms this changes the message format -
the "libEGL" prefix is gone, replaced by mesa_log's "MESA-EGL" tag.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33975>
Now that we stash the offset and alignment, this helper is redundant.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Mary Guillemard <mary@mary.zone>
Backport-to: 26.1
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40473>