This comes in destructive and nondestructive flavours, to be used to
insert an instruction into a tuple and check if an instruction is
insertable respectively. It is responsible for FAU slot matching.
It's annoying this sort of logic is duplicated in 3 places
(bi_lower_fau, here, and packing) but they each work with different sets
of assumptions...
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8354>
Needed to satisfy max constant constaints. These aren't precise but
they should be a good enough approximation for now.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8354>
In the near future we'll schedule out-of-order via a dependendency graph
and worklist. For now, emulate in-order operation.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8354>
For the new schedule infrastructure. This supports multiple tuples per
clause, unlike the old hack lowering.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8354>
To satisfy the numerous architectural scheduler constraints, quite a bit
of state is required per-tuple, per-clause, and per-block. These data
structures allow maintaining this state separate from the main IR
data structures, allowing for partial constructions and nondestructive
operations.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8354>
Rather than doing this at pack time like before, or adding extra
constraints to the already overcomplicated scheduler, let's just include
it like a regular FAU source.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8354>
We already track the full liveness so this is a trivial optimization,
with an especial win for shaders reading only a subset of components of
gl_FragCoord.
More importantly, it's required for proper scheduling (in soft mode)
when vectors are used and some (but not all components) are promoted to
temporary registers.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8354>
Less critical than other metrics, but still matters for instruction
cache hit rate, and worth being aware of.
And, fine, it makes the scheduler look like a bigger win on another
axis.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8354>
The prefetch buffer size is larger than first thought, but includes
the final clause, so subtract the size of the final clause from the
prefetch size.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8354>
Make it more obvious what the issue is. "_FAST" is not a suffix on
Bifrost.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8894>
For G71 but should work on any Bifrost, probably overlaps some CL stuff.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8894>
These should not be in a union together.
[Note: this does not need to be backported, since the affected
instruction is not emitted under any circumstances in the stable
branches]
Fixes: dd11e5076e ("pan/bi: Add new bi_instr data structure")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8894>
It is possible to compile pipelines in multiple threads, but when we
are dumping debug information for shaders, we want all the outputs
serialized so we can make sense of it.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8913>
This makes it easier to preview docs changes in merge-requests. Also
make sure we build the docs right away, rather than waiting for when
marge merges. This allows us to see the artifacts right away.
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8398>
It seems that VM_ALWAYS_VALID means that all BOs must fit in
memory (VRAM+GTT) for each submission. This is causing a lot of
troubles when the total allocated memory is greater than the
available memory, especially on APUs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8779>
This will allow us to use the global BO list even without
RADEON_FLAG_PREFER_LOCAL_BO which can cause a lot of troubles
on APUs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8779>
radv_cs_add_buffer() already guarantees that and virtual buffers
are added via radv_amdgpu_cs_add_virtual_buffer().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8859>
This saves a 64-bit pointer from radv_amdgpu_winsys_bo and it's
also common to pass a winsys pointer as the first parameter.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8859>
This allows to remove one 64-bit pointer from radv_amdgpu_winsys_bo.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8859>
It seems that recent fixes have made its results stable (other than
existing flakiness in texturegrad), so we can use all the CPUs and a
couple more boards and get more coverage.
Acked-by: Daniel Stone <daniel@fooishbar.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8787>
With the runner fixes, we were down to 2 minutes of boot time and 2
minutes of CTS time for a total of 4 minutes. We've got plenty of time
budget now to increase our coverage.
Acked-by: Daniel Stone <daniel@fooishbar.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8787>
3 commits in 0.5.0:
- 20-40s savings on many of our CI runs by dropping the clever test size
scaling code.
- Even bigger savings (especially on deqp-vk runs) by increasing maximuim
test group size (~1/4 of runtime was spawning deqp on cheza, that cost
is cut by ~75%)
- No more needing to manually set MESA_DEBUG=silent
2 commits in 0.5.1:
- Fixed automatic thread pool sizing to keep all CPUs busy (thanks for
catching that Bas!).
- Automatically size down test groups on short test lists and many CPUs,
so split the list evenly between CPUs (such as on freedreno -options
jobs).
Acked-by: Daniel Stone <daniel@fooishbar.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8787>
The base mask previously used was 0xffffffff. This is not correct (but
should still work) for 16-bit and 8-bit values, but it means the high
32-bits of 64-bit values will get chopped off.
Instead of just restricting the pattern to 32-bits (as was done before
00b28a50b2), this extends the optimization in two ways:
1. Make it correct for other bit sizes.
2. Make it work for arbitrary shift counts.
This has the added benefit of reducing the number of patterns actually
added (7 previously, 4 now).
The "Reassociate for improved CSE" part is just reverted to its
pre-00b28a50b2c behavior. I doubt that pattern is likely to have much
impact outside 32-bits.
This change fixes the piglit tests
tests/spec/arb_gpu_shader_int64/fs-shl-of-shr-int64.shader_test and
tests/spec/arb_gpu_shader_int64/fs-iand-of-iadd-int64.shader_test.
All of the shaders helped in shader-db are vertex shaders on platforms
with vector-oriented vertex processing. The shaders contain ((x >> 16)
<< 16). These platforms set lower_extract_word, so the optimization
that transforms (x >> 16) to extract_u16 doesn't trigger. With only ~60
shaders involved, I didn't bother trying to add extract_XYZ versions of
these patterns to try to get those cases.
Fixes: 00b28a50b2 ("nir/algebraic: trivially enable existing 32-bit patterns for all bit sizes")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Haswell and earlier Intel GPUs had simlar results. (Haswell shown)
total instructions in shared programs: 16397554 -> 16397496 (<.01%)
instructions in affected programs: 7961 -> 7903 (-0.73%)
helped: 58
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.36% max: 1.89% x̄: 0.99% x̃: 0.78%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.13% -0.85%
Instructions are helped.
total cycles in shared programs: 1035483770 -> 1035483504 (<.01%)
cycles in affected programs: 75922 -> 75656 (-0.35%)
helped: 44
HURT: 2
helped stats (abs) min: 2 max: 12 x̄: 6.14 x̃: 2
helped stats (rel) min: 0.05% max: 1.67% x̄: 0.87% x̃: 0.72%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 0.06% max: 0.06% x̄: 0.06% x̃: 0.06%
95% mean confidence interval for cycles value: -7.28 -4.29
95% mean confidence interval for cycles %-change: -1.03% -0.63%
Cycles are helped.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8852>
this fixes a bunch of issues with 3-component formats, which aren't supported
for various operations on certain drivers
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8697>
Fixes the following building error:
external/mesa/src/amd/vulkan/radv_android.c:752:77: error: too few arguments to function call, expected 4, have 3
VkResult result = radv_image_create_layout(device, create_info, mem->image);
~~~~~~~~~~~~~~~~~~~~~~~~ ^
external/mesa/src/amd/vulkan/radv_private.h:2175:1: note: 'radv_image_create_layout' declared here
VkResult
^
1 error generated.
Fixes: 7f7da82dbb ("radv: Add image layout with drm format modifiers.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8899>
Fixes the following building error in Android:
FAILED: ninja: 'external/mesa/src/amd/vulkan/radv_entrypoints_gen.py',
needed by 'out/target/product/x86_64/gen/STATIC_LIBRARIES/libmesa_radv_common_intermediates/radv_entrypoints.c',
missing and no known rule to make it
Fixes: 23f8ca0c9d ("radv: port to using common dispatch code.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8899>
The hardware can only scan-out linear buffers with a pitch
aligned to 256. It can only use packed buffers for cursors.
Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8500>