If the (vbi, divisor) tuple matches, we can save an attribute buffer
descriptor. We do the linking at CSO create time. This should be a bit
more cache friendly.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
Broken in several ways. Hide it until we can get this sorted, and have a
test plan to keep it sorted.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
It's too complicated and probably for no actual benefit. The main reason
we have BGR formats is for display, but that's export and doesn't get
hit by this path. Internal BGRA textures are possible with a Mesa
extension but sufficiently rare that I regret suggesting this as a
possible optimization. My apologies, and thanks for the fish.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
Reduces RA memory footprint by 4x, fixing an OOM in the following dEQP
test that otherwise would allocate 8GB of memory...
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.36
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
If the shader packs multiple varyings into the same location with
different location_frac, we'll need to lower to a single varying store
that collects all of the channels together. This is not trivial during
code gen, but it is trivial to do in NIR right before codegen by relying
on nir_lower_io_to_temporaries. Since we're guaranteed all varyings will
be written exactly once, in the exit block, we can scan the shader
linearly and collect stores together in a single pass.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
We have no native way to swizzle out a nonzero component in a load, but
we can simply load extra components and do the swizzle in shader
instructions. This is inefficient, since it loads data to discard
immediately, but it's required for conformance in some edge cases.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
We need to offset by the number of attributes, since the primary
attribute table is shared for images and vertex attributes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
Now that preloading is handled correctly, there's nothing 'special'
about R59 and up, so this gets us a few more registers to work with.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
Now that we have affinity masks in RA, we can handle this as an easy
case of register liveness analysis, rather than creating pseudo-nodes
and trying hard to coalesce the resulting moves.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
I want to use these fields for a similar purpose in the register
allocator, so they won't be zero anymore for scheduling.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
If we know that a value is killed in the next tuple, there is no need to
write it out to the register file. We already handled this as a packing
fixup. However, avoiding this write also frees up an extra slot in the
register block, which offers additional scheduling freedom. To take
advantage of this, we extend liveness analysis to work while scheduling,
and modify the schedulable predicate accordingly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
Flag day change to swap the order of the "scheduler" with the register
allocator. This gives RA much more freedom without significantly
hndering bundling.
It also opens up the door to Adult-level Scheduling which would occur
prior to bundling.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
gcc 11 dixit:
In function ‘sample_2d_ewa’,
inlined from ‘sample_lambda_2d_aniso’ at ../src/mesa/swrast/s_texfilter.c:1995:10:
../src/mesa/swrast/s_texfilter.c:1729:13: warning: ‘sample_2d_nearest’ reading 16 bytes from a region of size 8 [-Wstringop-overread]
1729 | sample_2d_nearest(ctx, samp, img, newCoord, rgba);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/mesa/swrast/s_texfilter.c: In function ‘sample_lambda_2d_aniso’:
../src/mesa/swrast/s_texfilter.c:1729:13: note: referencing argument 4 of type ‘const GLfloat *’ {aka ‘const float *’}
Indeed, newCoord is GLfloat[2] but the argument is typed GLfloat[4],
even though only the first two (s and t) are ever read. Fix the array
size in the function signature to reflect the maximum element actually
addressed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11273>
The only case RTTI is used in nouveau is type assertion at:
File src/gallium/drivers/nouveau/codegen/nv50_ir.cpp:
assert(typeid(*i) == typeid(*this));
This assertion is used 'to be on the safe side' only and not mandatory.
In Android we do not have rtti for libLLVM therefore this assertion
has to be skipped.
Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11069>
Fixes these new gcc11 warnings:
In file included from ../src/mapi/glapi/glapi_dispatch.c:174:
src/mapi/glapi/gen/glapitemp.h:3191:68: warning: argument 1 of type 'const GLdouble *' {aka 'const double *'} declared as a pointer [-Warray-parameter=]
3191 | KEYWORD1 void KEYWORD2 NAME(LoadTransposeMatrixd)(const GLdouble * m)
| ~~~~~~~~~~~~~~~~~^
In file included from ../src/mapi/glapi/glapi_priv.h:31,
from ../src/mapi/glapi/glapi_dispatch.c:40:
../include/GL/gl.h:1901:62: note: previously declared as an array 'const GLdouble[16]' {aka 'const double[16]'}
1901 | GLAPI void GLAPIENTRY glLoadTransposeMatrixd( const GLdouble m[16] );
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11198>
attempting to read the inlined uniforms directly after the variant key
using the size of the variant is not going to work since the variant union
is (sometimes) much larger than the size of the actual struct being used,
meaning that this would just copy a bunch of zeroes instead of the actual
inlined uniforms
Fixes: 7f28775edc ("zink: implement uniform inlining")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11003>
Without it the hardware launches an IB2 which might hang in some
rare situations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11214>
It's illegal to emit DRAW_{INDEX}_INDIRECT_MULTI from an IB2 on GFX7.
PAL applies this workaround for indirect dispatches and also on
GFX8-9 but it doesn't seem needed.
This fixes various GPU hangs on Bonaire (GFX7).
Cc: 21.1 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11214>