We have an SR for it, which can save a bit of math. This came up while working
on the spiller.
total instructions in shared programs: 1778396 -> 1778376 (<.01%)
instructions in affected programs: 3036 -> 3016 (-0.66%)
helped: 10
HURT: 3
Instructions are helped.
total bytes in shared programs: 12185182 -> 12185018 (<.01%)
bytes in affected programs: 38640 -> 38476 (-0.42%)
helped: 18
HURT: 2
Bytes are helped.
total halfregs in shared programs: 531218 -> 531174 (<.01%)
halfregs in affected programs: 471 -> 427 (-9.34%)
helped: 6
HURT: 0
Halfregs are helped.
total threads in shared programs: 18909056 -> 18909184 (<.01%)
threads in affected programs: 1280 -> 1408 (10.00%)
helped: 2
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
Consumers might not pass through the modifier information in this case.
Fixes XWayland/mutter using dma-buf v4 feedback (though the fact they
try to use implicit modifiers is likely a bug on their end, and will
decrease performance).
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
Don't call lower_mediump_io for no16. This is helpful for debugging and soon
driconf-shaming apps with broken precision qualifiers.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
Apps can have pathological use cases where huge resources are shadowed
repeatedly. An app that alternately writes to a resource and then uses
it to draw can create an unbounded amount of shadow BOs.
To fix this, introduce both a maximum resource size for shadowing, and a
maximum cumulative size that resource may be shadowed before we start
flushing readers. The flush path then clears the counter, as does the
happy path where there are no readers left after flushing writers.
Fixes massive memory bloating in Firefox and probably others.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
BOs can be oversized, as they can come from the BO cache. Make sure to
always use the resource layout size, not the BO size, when we need this
for some reason.
This fixes BO shadowing creating overlarge BOs, and also the attachment
size for submissions (probably doesn't matter, but it's more correct now).
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
These should "just" work, promoting the 8-bit channels to 16-bit registers
internally, allowing us to use our 8-bit stores with 8-bit data vectors packed
in 16-bit registers. All other non-conversion ALU gets lowered by the previous
patch, this is just needed for simple things like nir_op_vec of lowered math
passed to a vectorized store.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
These have no real Vulkan or Gallium dependence and are (as such) useful for
both VK and GL without any real change in level of abstraction. Do the code
motion.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
When NV21 lowering with hardware sampling and shader CSC was added, the
incorrect PIPE_FORMAT_G8_B8R8_UNORM was used. That format is supposed to
represent vulkan NV12 instead.
This commit introduces PIPE_FORMAT_R8_B8G8_UNORM, which correctly describes the
gallium mapping for YUV CSC, with R as Y, instead of G as Y.
Fixes: 26e3be513d ("gallium/st: add support for PIPE_FORMAT_NV21 and PIPE_FORMAT_G8_B8R8_420")
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24266>
YV12 is the same as DRM_FORMAT_YVU420.
We lower it to PIPE_FORMAT_R8_B8_G8_420, which is equivalent to
PIPE_FORMAT_R8_G8_B8_420 with U/V planes swapped.
This is used for hardware that can sample from YUV but need CSC in shader.
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24266>
This new format is similar to PIPE_FORMAT_G8_B8_R8_420, but with R as Y, G as U
and B as V. The need for two diferent formats here is because gallium maps the
YUV channels differently from vulkan.
Some hardware, e.g. Mali GPUs, can sample from I420 but need CSC in shader,
this patch implements that.
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24266>
The SPIR-V spec says:
If Value is positive with a magnitude too large to represent as a
16-bit floating-point value, the result is positive infinity.
If Value is negative with a magnitude too large to represent as a
16-bit floating-point value, the result is negative infinity.
This is only the case for rtne v_cvt_f16_f32
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24617>
Add a lowering pass that lowers interpolation to math on the coefficient
registers. This handles interpolateAtOffset, as well as flat shading as an easy
special case.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24498>
When TCS isn't linked with VS, the vertex stride should be computed
from vertex outputs. This is only for shader object and shouldn't
change anything right now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24540>
With shader object, if VS and TCS aren't linked together, the LSHS
vertex stride should be computed from the vertex outputs. Otherwise,
if an output is unused, the stride is wrong in TCS.
This is currently for GFX8 only because for merged shaders this won't
be needed but shader object on GFX9+ isn't yet a thing.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24540>
According to WA description, we need to track DS write state
and emit a PSS_STALL_SYNC whenever that state changes.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18411>
According to WA description, we need to track DS write state
and emit a PSS_STALL_SYNC whenever that state changes.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18411>
This is required for Wa_18019816803 when blorp emit DS state.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18411>
This patch removes start_slot from set_vertex_buffers() as suggested in
https://gitlab.freedesktop.org/mesa/mesa/-/issues/8142
compilation testing:
all gallium drivers, nine frontend compilation has been tested.
d3d10umd compilation has not been tested
driver, frontend testing:
only llvmpipe and radeonsi driver was tested running game
only the nine frontend changes are complex. All other changes are easy.
nine front end was using start slot and also using multi context.
nine frontend code changes:
In update_vertex_elements() and update_vertex_buffers(), the vertex
buffers or streams are ordered removing the holes. In update_vertex_elements()
the vertex_buffer_index is updated for pipe driver to match the ordered list.
v2: remove start_slot usage code from Marek (Marek Olšák)
v3: nine stream number holes mask code from Axel (Axel Davy)
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> (except nine, which is Ab)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22436>
Currently, negative array texture indices get saturated to 0 which,
while technically in-bounds, isn't what we want for Vulkan with image
robustness or robustness2. Vulkan requires that a negative index on a
texelFetch() count as out-of-bounds but a negative index on any other
texture operation gets clamped to 0. (See the spec section entitled
"(u,v,w,a) to (i,j,k,l,n) Transformation And Array Layer Selection").
Instead of using CVT for TXF, we now take U32 MAX with 0xffff. Because
it's unsigned, this ensures that negative array indices clamp to 0xffff
and will be considered out-of-bounds by the hardware (there are a
maximum of 2048 array indices in an image descriptor). For everything
other than TXF, we keep using an F32->U16 conversion but add a saturate.
This ensures that negative array indices clamp to 0 as per the Vulkan
spec. Very large indices will clamp to 0xffff which the hardware will
clamp to the maximum array index.
This fixes 324 tests in the dEQP-VK.robustness.* group, all those for 1D
and 2D array textures
Acked-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24593>
ARB shaders have no rules restricting i/o interfaces since it's assumed
that they'll match by name. given that mesa marks these all as separate
shaders, a separate path is needed to ensure these variables correctly
match up their i/o even when it's mismatched
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24608>
The presentation timing extension is used for doing WaitForPresent
properly, but we accidentally bind it after an early return intended to
stop us from binding dmabuf when software rendering.
Remove the early return.
cc: mesa-stable
Signed-off-by: Derek Foreman <derek.foreman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24588>