Implement custom border colours, as required by OpenGL's CLAMP_TO_BORDER and
Vulkan with customBorderColor. This uses an extended sampler descriptor, which
has space for the custom border values. The trouble is that the border must be
packed into an internal interchange format that depends on the original format
in a complex way. That said, we're not solving NP-complete problems here, and it
passes the tests (dEQP-GLES31.functional.texture.border_clamp.* and piglit
texwrap).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20570>
Vulkan spec 18.8. Primitives Generated Queries:
When a generated primitive query for a vertex stream is active,
the primitives-generated count is incremented every time a
primitive emitted to that stream reaches the transform feedback
stage, whether or not transform feedback is active.
We can see the order of stages in chapter 27 Fixed-Function
Vertex Post-Processing, which shows that the transform feedback
stage is before rasterization (and therefore culling).
Conclusion is that culled primitives should be included
in the primitives generated query.
This commit makes sure to emit the primitives generated query
code before culling and uses the input primitive count passed
to the current wave instead of the exec mask after culling.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21037>
Match iadd(x, #y). The format shift will get constant-folded away and, if y
is sufficiently small, the constant will be inlined by the AGX backend
optimizer. This gets rid of piles of 64-bit arithmetic from lowering UBOs. It
probably doesn't matter for perf since that's happening in preamble shaders but
it *is* noisy.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21108>
The offset is in vec4s, not words (unlike the component). This doesn't matter
right now since we get everything lowered (offset -> 0) but it will come up if
we implement clip distances natively (instead of lowering in FS).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21097>
These implement gl_ClipDistance in hardware, avoiding the fragment shader
lowering. Unfortunately, they can't be disabled on a per-plane basis and they
can't be interpolated, so using them for OpenGL would still require a bunch of
extra lowering steps. Still, we should document the hardware and the caveats.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21097>
The hardware doesn't extend in this case, we need to extend for it. This
fixes 32-bit render target formats with lower_mediump_io.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21082>
This works around bugs in a LOT of applications, since fp16 texture coordinates
are almost never appropriate even though it's a valid implementation of the GLES
spec. It also doesn't seem to matter for perf.
Code from the Bifrost compiler which implements the same workaround for slightly
different reasons.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21082>
This speeds up glReadPixels. Instead of reading from the write-combined
framebuffer and converting colours on the CPU, this blits on the GPU to a
writeback staging resource with the colour conversion for free, and memcpies
from the writeback staging resource on the CPU.
In general, due to textures being write combined and tiled/compressed by default
by staging resources being linear writeback, blit-based texture transfer should
win out (you were going to blit anyway), particularly when format conversion is
involved
33% reduction in wall clock time for grim at 4K. No change in deqp-gles2
runtime.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21063>
When playing the My Little Pony theme song at 1080p on T8103, with mpv's GPU
compositing but software decoding, CPU usage drops from 200% to 50% due to
proper caching of the staging resource.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21063>
We only need 4 byte alignment, not 8 bytes. This isn't a big difference in
practice, but it probably reduces padding in some cases. More importantly, it
corrects our XML to match what the hardware actually does, which is great.
(There is exactly enough room for a 40-bit address with 4 byte alignment.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21118>
Check the size explicitly, instead of just implicitly in the GenXML pack: it is
the responsibility of the caller to split up larger uploads. While this is
nominally more complicated, agx_usc_uniform is called in the draw hot path
whereas the actual splitting decision can usually be done at compile-time.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21118>
There's a corner case where 3D textures have extra padding compared to 2D
arrays. We need to communicate that to ail.
Fixes
dEQP-GLES3.functional.texture.specification.texstorage3d.size.3d_32x16x64_4_levels.
That test now uses the same layout as Metal.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21114>
This has a subtle interaction with page-aligned layers. Written while debugging
dEQP-GLES3.functional.texture.filtering.cube.combinations.nearest_nearest_repeat_clamp
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21114>
The mipmapped_z = true case is checked against Metal, the false case is smoke
testing the old behaviour (which is still used for 2D arrays).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21114>
For 3D images, the full miptree depends on the depth of the image, in contrast
to 2D arrays. We need to account for this to calculate the correct layer
strides.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21114>
This is to inform you of some planned downtime in the LAVA lab as follows:
Start: 2023-02-04 06:00 GMT
End: 2023-02-06 12:00 GMT
Signed-off-by: Sergi Blanch Torne <sergi.blanch.torne@collabora.com>
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21119>
Builds on the work of !15121. This gets to delete even more code
because many drivers shared a lot of code for i2b and f2b.
No shader-db or fossil-db changes on any Intel platform.
v2: Rebase on 1a35acd8d9.
v3: Update a comment in nir_opcodes_c.py. Suggested by Konstantin.
v4: Another rebase. Remove f2b stuff from Midgard.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>
There were only two users. Replace each with nir_fneu instead.
This is now a squash of what was two separate commits.
nir_lower_pstipple_block is called after nir_lower_bool_to_int32, so
nir_fneu32 has to be used here or there will be regresssions in stipple
tests on llvmpipe.
v2: Rebase on !20869.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Suggested-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>
it's annoying to validate this at runtime since it has to happen during draw,
but storing the "usable" ds3 mode separately from the pipeline state should
be a reasonable enough compromise for perf here...hopefully
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21100>
apparently an actual null descriptor is illegal here, and it's wasted cpu
anyway, so just cache the dummy surface on init and use that data when
fbfetch isn't active but the layout requires it
Fixes: 7ab5c5d36d ("zink: use EXT_descriptor_buffer with ZINK_DESCRIPTORS=db")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21100>
When skiavk is the default system ui renderer, venus icd gets preloaded
into Zygote. However, Zygote access to render node is normally denied by
selinux except for legacy bootanimation purpose. This change fixes venus
icd loading to avoid invoking cros gralloc driver loading by moving the
perform op outside, so that we still get the memory footprint win.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21107>
Instead of generating num_components*simdwidth scattered stores, if
there's no indirect then we can just look up the pointer to the
base_offset and do a simd store there.
dEQP-VK.subgroups.ballot_broadcast.compute.subgroupbroadcast_i64vec4 goes
from 30s to ~2s.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21084>
This now matches how they get dereffed by get_soa_array_offsets() -- each
array element has num_components vecs inside of it, rather than each
components has an array in it.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21084>
We first do an incomplete check for whether the linker supports
--gc-sections, then potentially add C and C++ arguments assuming that it
works, then later do a complete check to see if it actually works and
use --gc-sections. This means we can end up putting functions and data
in separate sections when we can't gc them.
Combine the checks, do less work, and be more accurate.
fixes: f51ce21e4e
("meson: Drop adding -Wl,--gc-sections to project c/cpp arguments.")
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21083>
This should be way faster to compile by not spamming so many loops at
LLVM, and faster to execute if LLVM didn't figure out what that loop
meant.
It looks vector reduce ops aren't really a thing, just a convenience in
the IR. We should be able to do better by counting zeroes in the
exec_mask != 0 result.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21001>