It seems triangle merging is incompatible with calculating derivatives
along primitive edges correctly. Take the appropriate NIR shader info
flags in the compiler and pass them down as a flag to the driver, so it
can set the disable triangle merging flag (formerly called "lines or
points").
TODO: Is this what macOS does when you set a sample mask there (which
apparently fixes the same bug on the Darwinia Metal backend)? Do we
also need to set this when sample masks are used?
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes Darwinia and dEQP2 projected tests.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20365>
Probably impossible to hit in practice but let's get it right. Found when
forcing RA to use the upper half of the reg file.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20365>
Extract out the code for unpack_64_2x32_split_x and use it for other integer
downcasts too to coalesce out a move. Pointless, but I wanted to have a little
RA fun after getting stencil export working.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20365>
Layer strides are based on the full miptree, and even for single-layer
images macOS always allocates a full one (possibly relevant for
compression). Make sure we do the same, regardless of how many mip
levels the user asked for.
Fixes Darwinia.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20365>
There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b. Instead of adding a huge
pile of additional patterns (including variations that include both ine
and i2b), always lower i2b to a != 0.
At this point in the series, it should be impossible for anything to
generate i2b, so there /should not/ be any changes.
The failing test on d3d12 is a pre-existing bug that is triggered by
this change. I talked to Jesse about it, and, after some analysis, he
suggested just adding it to the list of known failures.
v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b.
v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py.
v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after
nir_lower_doubles makes progress. The latter can generate b2i
instructions, but nir_lower_int64 can't handle them (anymore).
v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I
had accidentally removed the f2b(bf2(x)) optimization.
v6: Just eliminate the i2b instruction.
v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused)
emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this
function was still used. 🤷
No shader-db changes on any Intel platform.
All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141165875 -> 141165873 (-0.0%)
Instructions helped: 2
Cycles in all programs: 9098956382 -> 9098956350 (-0.0%)
Cycles helped: 2
The two Vulkan shaders are helped because of the "new" (('b2i32',
('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern.
Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version]
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
Now we support all the vertex formats! This means we don't hit u_vbuf for format
translation, which helps performance in lots of applications. By doing the
lowering in NIR, the vertex fetch code itself can be optimized by NIR (e.g.
nir_opt_algebraic) which can improve generated code quality.
In my first implementation of this, I had a big switch statement mapping format
enums to interchange formats and post-processing code. This ends up being really
unwieldly, the combinatorics of bit packing + conversion + swizzles is
enormous and for performance we want to support everything (no u_vbuf
fallbacks). To keep the combinatorics in check, we rely on parsing the
util_format_description to separate out the issues of bit packing, conversion,
and swizzling, allowing us to handle bizarro formats like B10G10R10A2_SNORM with
no special casing.
In an effort to support everything in one shot, this handles all the formats
needed for the extensions EXT_vertex_array_bgra, ARB_vertex_type_2_10_10_10_rev,
and ARB_vertex_type_10f_11f_11f_rev.
Passes dEQP-GLES3.functional.vertex_arrays.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19996>
Long term, I think having i2i16 and i2i32 available with 8-bit sources should
make lowering the rest of 8-bit away a bit easier. Short term, this avoids
special casing 8-bit in the VBO lowering code.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19996>
The coefficient register is 16-bit so our builder will make the iter 16-bit too
(maybe not the best design...), force fp32 to match the NIR intrinsic.
Fixes glsl-fs-fragcoord-zw-ortho
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20017>
Although the metadata is possibly one byte per 8x4 block, the
logical block size for compression/allocation is a 16x16 block,
so align to that. Also align the initial dimensions to that size,
and change the minification to a simple DIV_ROUND_UP.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20031>
The main buffer is twiddled as before, but there's now also an auxiliary
compression buffer that we need to reserve space for.
With compression, the main buffer is aligned less. The macOS logic seems to be
to align to the page size only if the texture is both 3D and mipmapped, *and*
the layer stride is greater than the page size.
That's gated on compression being enabled. Page alignment seems to be needed for
uncompressed twiddled cube maps.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19999>
Better occupancy, which is especially important when the background shader
does memory access (for reloads). On my 4K monitor, glmark2 -bdesktop fullscreen
from 95fps to 133fps.
At default settings, glmark2 -bterrain from 63fps to 71fps.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19997>
UBSan complains otherwise:
../src/asahi/compiler/agx_pack.c:701:21: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
../src/asahi/compiler/agx_pack.c:534:18: runtime error: left shift of 8 by 28 places cannot be represented in type 'int'
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19971>
r5 and r6 are always getting lowered. Will prevent a regression with VBO
lowering on a shader which has stride=0 and hence gets the vertex ID read
optimized out with NIR:
dEQP-GLES2.functional.draw.random.50
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19971>
Massive performance gains, some fps before/after numbers from glmark2:
[shading] 1486 -> 2391
[refract] 87 -> 127
[terrain] 32 -> 56
...and it's basically for free with enough copy/paste, so thank you to Boris
Brezillon for an excellent Asahi patch, the LRU cache seems to work great on M1
:-p
There are a few minor changes I made from panfrost, notably adjusting the
constants to account for 16KiB pages and switching from pthread_mutex to
simple_mtx to be less weird in Mesa.
For context on the design, the following commits evolved it in Panfrost and
their commit messages may be useful... The logic in this module is the product
of years of mistakes and correcting course :-)
f06809cdca ("panfrost: Evict the BO cache when allocation fails")
77d0498913 ("panfrost: Fix major flaw in BO cache")
ee82f9f07e ("panfrost: Try to evict unused BOs from the cache")
2225383af8 ("panfrost: Make sure the BO is 'ready' when picked from the cache")
9af4aeaaf7 ("panfrost: Don't return imported/exported BOs to the cache")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19971>
This defeats the point of specifying alignments and of packing allocations
together with the BO cache. We're a real driver now, let's allocate memory like
one.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19971>
We have these native. Passes the relevant piglits. Large reduction in memory
usage on Xonotic on higher settings (8x less memory per texture), which allows
Xonotic to run at high settings without OOMing.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tested-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19903>
Flag day change to replace the previous hardcoded background/end-of-tile shaders
and the API-style load/store_output in fragment shaders with the generated
shaders and lowered *_agx intrinsics. This gets us working non-UNORM8 render
targets and working MRT. It's also a step in the direction of working MSAA but
that needs a lot more work, since the multisampling programming model on AGX is
quite different from any of the APIs (including Metal).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>