Some raytracing tests are allocating lots of buffer and because of our
2Mb alignment restriction on local memory, we're running our of VMA...
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Zhang, Jianxun <jianxun.zhang@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
Things are a bit confusing because we use the term "scratch" for 2
different things :
* the buffer for register allocation spilling
* the buffer for storing live values between splitted shaders around shader calls
Here we're fixing the missing register allocation spilling buffer.
v2: update comments (Caio)
fix scratch bo size computation with pipeline libraries (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
This was only necessary for gen7 platforms that no longer support by
anv.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18601>
Intel has different Z interpolation float point rounding
than other mesa gpus
For example gl_Position.z = 0.0 will be interpolated to
gl_FragCoord.z = 0.5 for all gpus
gl_FragCoord = -0.00000001 will be interpolated to
gl_FragCoord.z = 0.4999999702 for Intel
and rounded to gl_FragCoord.z = 0.5 for other gpus
Games with LEQUAL depth func will fail depth test on Intel
and will pass it on other gpus in such case
This workaround lowers translated depth range
and several gl_FragCoord.z coords with extra small difference
will be translated to the same UINT16\UINT24\UINT32
value of an integer depth buffer
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7199
Signed-off-by: Illia Polishchuk <illia.a.polishchuk@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18412>
We found a perf regression with 9027c5df4c ("anv: remove the
LOCAL_MEM allocation bit") which seems to be that we over subscribe
local memory, leading i915 to swap things in/out too much.
This change avoid putting buffers in local memory if they are not
allocated from a DEVICE_LOCAL heap.
Maybe we can revisit this later if i915 is better able to deal with
more buffers in local memory.
v2: Remove implicit_css from anv_bo when not in lmem (Ivan)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9027c5df4c ("anv: remove the LOCAL_MEM allocation bit")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7188
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18395>
We don't need the relocation offsets anymore, and just want to pin the
BO, and combine the address into a uint64_t. We can just open code
those two things; it's actually less code.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18208>
We don't need the offset to write a relocation at any longer, so all
it does is call anv_reloc_list_add_bo() at this point. Just use that.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18208>
We now always softpin and use the load_global_constant case, so there's
no need to set up a UBO for NIR constants.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18208>
This was only needed on Haswell and older due to the kernel command
parser not allowing us to chain batches. anv no longer support this.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18208>
Instead of this fragile use_primitive_replication bit which we set
differently depending on whether or not we pulled the shader out of the
cache, compute and use the information up-front during the compile and
then always fetch it from the vue_map after that. This way, regardless
of whether the shader comes from the cache or not, we have the same flow
and there are no inconsistencies.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17602>
These functions are no longer used since the introduction of common
dispatch.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18284>
We always want to use local memory if possible, we'll just add the
system memory heap if the buffer needs to be host visible.
v2: Drop some usages of ANV_BO_ALLOC_LOCAL_MEM_CPU_VISIBLE
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17873>
Each logical device can point to its physical device intel_device_info
saving at least one intel_device_info.
This also allow us to set 'const' to avoid values in intel_device_info
being changed by mistake.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17897>
v2: Drop ANV_PIPE_UNTYPED_DATAPORT_CACHE_FLUSH_BIT from invalidate bits (Lionel)
Add utrace support
Expand on comment about PIPE_CONTROL::UntypedDataPortCache
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16905>
Wa_14010455700 is dependent on the format and sample count, but our
code to track whether or not it had been applied was only dependent on
the format.
As a result, we failed to enable the workaround when an app used a D16
2xMSAA buffer, then a D16 1xMSAA buffer right afterwards.
Make the workaround tracking code sample-dependent to fix this.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17859>
Instead of having 2 VkMemoryType pointing to the same VkMemoryHeap, we
have each VkMemoryType with VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT (one
host visible, the other not) point to its own VkMemoryHeap. For the
local heap that is host visible, we'll use the
I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS flag at GEM BO creation.
When the smallbar uAPI is not available we fallback to a single heap
and do not use I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS.
v2: Handle probed_cpu_visible_size == probed_size (Matthew)
v3:
* Jordan: Use region info from devinfo
v4: Also make the vram host visible heap as local (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16739>
When we made depth/stencil dynamic, we lost the optimization. This is
particularly important for cases where the stencil test is enabled but
never writes anything as certain combinations with discard can cause
the stencil write (which doesn't do anything) to get moved late which
can be a measurable perf hit. According to 028e1137e6 ("anv/pipeline:
Be smarter about depth/stencil state", it was a couple percent for DOTA2
on Broadwell back in the day. No idea how it affects current titles.
This may also improve the depth/stncil PMA workarounds on Gen8 and Gen9
since they're now looking at optimized depth/stencil state.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17564>