Follow-up to previous commit, this time to fix encoding/decoding for
VkDrmFormatModifierPropertiesListEXT::drmFormatModifierCount. Fixes a
workaround (WA1) in the venus-protocol.
Signed-off-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21395>
If a dual source blend colour is never written, src1 will be null and it will be
invalid to dereference it. src1 is dereferenced both for the f2fN instruction
but also if a dual blend factor is used... even if the latter isn't strictly
valid, segfaulting in the NIR pass seems a lot meaner than blending with zero.
The referenced commit hosed Asahi, causing anything that used blending to crash.
Panfrost is unaffected since it always supplies a dual colour due to our crude
construction of blend shaders.
Fixes: 8313016543 ("nir/lower_blend: Consume dual stores")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21544>
This is used to replace c_sse2_arg and c_sse2_args in latter commits
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21375>
RGP traces need a dump of shader code in order to display ISA and
instruction trace. Previously, this was read back from GPU at trace
creation time. However, for future changes that implements upload shader
to invisible VRAM, the upload destination will be a temporary staging
buffer and will be only accessible during shader creation.
To allow dumping in such cases, copy the shader code to a separate buffer
at creation time, if thread tracing is enabled.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21513>
some drivers may provide this in heaps that get used by non-staging resources,
so avoid extra copies in that case
unlike the previous attempt at this optimization, this utilizes the screen-based
context for thread-safe transfers, which should avoid races/crashes
fix#8171
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21452>
G13 does not support sampler descriptor LOD biasing, so this needs to be lowered
to shader code for APIs that require this functionality. Add an option to do
this lowering while doing our other backend texture lowerings. This generates
lod_bias_agx texture instructions which the driver is expected to lower
according to its binding model.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21276>
Track the LOD bias of samplers and upload them at draw time to uniform
registers. This could be optimized in the future.
Vulkan will probably want to pull from a descriptor set instead.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21276>
Add a new texture opcode that returns the LOD bias of the sampler. This will be
used on AGX to lower sampler LOD bias to txb and friends. This needs to be a
texture op (and not a new intrinsic) to handle both bindless and bindful
samplers across GL and Vulkan in a uniform way.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21276>
Now that we're working on lowered I/O, passing in the dual source blend colour
via a sideband doesn't make any sense. The primary source blend colours are
implicitly passed in as the sources of store_output intrinsics; likewise, we
should get dual source blend colours from their respective stores. And since
dual colours are only needed by blending, we can delete the stores as we go.
That means nir_lower_blend now provides an all-in-one software lowering of dual
source blending with no driver support needed! It even works for 8 dual-src
render targets, but I don't have a use case for that.
The only tricky bit here is making sure we are robust against different orders
of store_output within the exit block. In particular, if we naively lower
x = ...
primary color = x
y = ...
dual color = y
we end up emitting uses of y before it has been defined, something like
x = ...
primary color = blend(x, y)
y = ...
Instead, we remove dual stores and sink blend stores to the bottom of the block,
so we end up with the correct
x = ...
y = ...
primary color = blend(x, y)
lower_io_to_temporaries ensures that the stores will be in the same (exit)
block, so we don't need to sink further than that ourselves.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21426>
Power-of-two SW stack sizes are prone to causing collisions in the
hashing function used by the L3 to map memory addresses to banks,
which can cause stack accesses from most DSSes to bottleneck on a
single L3 bank. Fix it by padding the SW stack stride by a single
cacheline if it was a power of two. This has been reported by Felix
DeGrood to improve Quake2 RTX performance by ~30% on DG2-512 in
combination with other RT patches Lionel Landwerlin has been working
on.
Many thanks to Felix DeGrood for doing much of the legwork and
providing several iterations of Q2RTX performance counter dumps which
eventually prompted me to consider the hash collision theory and
motivated this patch, and for providing additional performance counter
dumps confirming that there is no longer an appreciable imbalance in
traffic across L3 banks after this change.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21461>
Do this by removing the compatibility table and only using hard coded shaders
when present. The hard coded shaders, along with the hard coding framework
itself, can be dropped once the compiler is capable of compiling the hard coded
shaders. In the meantime we don't want to risk regressing things that we know
work because we temporarily can't test them.
This restriction is being dropped now as the new compiler framework has been
merged and we want to make use of it so it can be developed further.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21495>
After 16 entries, we fall back to the previous logic that used a hash
map to link the resource's state per context.
Preventing hash map churn by cheaply tracking up to 16 context's worth
of states per resource significantly reduces CPU cost in
find_or_create_state_entry
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21528>