Instead of doing a complicated hack with the POT divisor, just zero the
stride of the linear attribute buffer like we do on the CPU.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
The function is complicated enough as it is -- hide the bit twiddling
behind a helper function.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
If the (vbi, divisor) tuple matches, we can save an attribute buffer
descriptor. We do the linking at CSO create time. This should be a bit
more cache friendly.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
Broken in several ways. Hide it until we can get this sorted, and have a
test plan to keep it sorted.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
It's too complicated and probably for no actual benefit. The main reason
we have BGR formats is for display, but that's export and doesn't get
hit by this path. Internal BGRA textures are possible with a Mesa
extension but sufficiently rare that I regret suggesting this as a
possible optimization. My apologies, and thanks for the fish.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
Reduces RA memory footprint by 4x, fixing an OOM in the following dEQP
test that otherwise would allocate 8GB of memory...
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.36
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
If the shader packs multiple varyings into the same location with
different location_frac, we'll need to lower to a single varying store
that collects all of the channels together. This is not trivial during
code gen, but it is trivial to do in NIR right before codegen by relying
on nir_lower_io_to_temporaries. Since we're guaranteed all varyings will
be written exactly once, in the exit block, we can scan the shader
linearly and collect stores together in a single pass.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
We have no native way to swizzle out a nonzero component in a load, but
we can simply load extra components and do the swizzle in shader
instructions. This is inefficient, since it loads data to discard
immediately, but it's required for conformance in some edge cases.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
We need to offset by the number of attributes, since the primary
attribute table is shared for images and vertex attributes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
Now that preloading is handled correctly, there's nothing 'special'
about R59 and up, so this gets us a few more registers to work with.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
Now that we have affinity masks in RA, we can handle this as an easy
case of register liveness analysis, rather than creating pseudo-nodes
and trying hard to coalesce the resulting moves.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
I want to use these fields for a similar purpose in the register
allocator, so they won't be zero anymore for scheduling.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
If we know that a value is killed in the next tuple, there is no need to
write it out to the register file. We already handled this as a packing
fixup. However, avoiding this write also frees up an extra slot in the
register block, which offers additional scheduling freedom. To take
advantage of this, we extend liveness analysis to work while scheduling,
and modify the schedulable predicate accordingly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
Flag day change to swap the order of the "scheduler" with the register
allocator. This gives RA much more freedom without significantly
hndering bundling.
It also opens up the door to Adult-level Scheduling which would occur
prior to bundling.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
gcc 11 dixit:
In function ‘sample_2d_ewa’,
inlined from ‘sample_lambda_2d_aniso’ at ../src/mesa/swrast/s_texfilter.c:1995:10:
../src/mesa/swrast/s_texfilter.c:1729:13: warning: ‘sample_2d_nearest’ reading 16 bytes from a region of size 8 [-Wstringop-overread]
1729 | sample_2d_nearest(ctx, samp, img, newCoord, rgba);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/mesa/swrast/s_texfilter.c: In function ‘sample_lambda_2d_aniso’:
../src/mesa/swrast/s_texfilter.c:1729:13: note: referencing argument 4 of type ‘const GLfloat *’ {aka ‘const float *’}
Indeed, newCoord is GLfloat[2] but the argument is typed GLfloat[4],
even though only the first two (s and t) are ever read. Fix the array
size in the function signature to reflect the maximum element actually
addressed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11273>
The only case RTTI is used in nouveau is type assertion at:
File src/gallium/drivers/nouveau/codegen/nv50_ir.cpp:
assert(typeid(*i) == typeid(*this));
This assertion is used 'to be on the safe side' only and not mandatory.
In Android we do not have rtti for libLLVM therefore this assertion
has to be skipped.
Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11069>