The ir3_info is reset by ir3_collect_shader_info() on the expectation
that all info is collected inside that function. This meant that we were
accidentally disabling early preamble. Re-enable it.
We keep a copy in ir3_info for shader statistics in the next commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29903>
The following syntax can now be used to set the initial content of
buffers:
@buf size (reg) val0, val1, ...
If the buffer is not fully initialized, remaining values will be set to
zero.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28625>
We now have a lot of feature toggles in fd_dev_info. Generate
env var options for all of them to quickly test whether feature
misbehaves or test its impact on the performance.
FD_DEV_FEATURES=%feature_name%=%value%:%feature_name%=%value%:...
e.g.
FD_DEV_FEATURES=has_fs_tex_prefetch=0:max_sets=4
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25939>
The set of magic regs is different between generations and even
sub-gens. Adding a new one and/or emitting one on specific generation
takes much more code than necessary. Doing this in a single place is
much nicer.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23217>
Event write is changes so much in a7xx that it makes sense to
create a new event CP_EVENT_WRITE7.
All credits to Connor Abbott for finding out what different flags
in these commands are doing.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23881>
This moves away from embedding the submit fence inside the pipe fence,
which lets us start refcnt'ing the fence. This will enable several
cleanups and improvements:
1. Get rid of fd_bo_fence, and just have fd_bo hold pending fd_fence
refs instead, which will be needed for cpu_prep implementation of
sub-allocated buffers.
2. For merged submits, we can just return a new reference to an
existing fence.
Note that this temporarily defeats submit-merging, which will be
fixed (and improved) in a following commit.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20263>
Each shader stage has its own "early preamble" flag.
Early preamble is likely an optimization to hide some of latency
when loading UBOs into consts in the preamble.
Early preamble has the following limitations:
- Only shared, a1, and consts regs could be used
(accessing other regs would result in GPU fault);
- No cat5/cat6, only stc/ldc variants are working;
- Values writen to shared regs are not accessible by the rest
of the shader;
- Instructions before shps are also considered to be a part of
early preamble.
Note, for all shaders from d3d11 games blob produced preambles
compatible with early preamble mode.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15901>
Since these were reverse-engineered, it's become clear that IBO
descriptors are just a subset of texture descriptors, and bindless reads
of readonly images actually use isam on the IBO descriptor, further
confirming that the two are always compatible, even if not all of the
texture fields exist for IBOs. It's pointless to have a separate type
for IBOs, and just leads to things getting out-of-sync unnecessarily
which has already happened. Just remove it and use TEX_CONST insted.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15114>
Inferring from blob's cmdstream the size of shader instruction
cache for:
- a630 is 64
- a650 is 128
- a660 is 128
On a650 and a660 gpu could hang if we exceed the limit. Though
it is not reproducible with computerator or a single amber
test. Also while blob limits the size to 128 - Turnip still
hangs with it but does not hang with the limit of 127.
On a630 there seem to be no hang when limit is exceeded.
Fixes the hang of compute shader in Alien Isolation on a650/a660.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14044>
This backend provides very basic a4xx support. It's enough to run
kernels with explicit stg/etc ops, but not with stgb/ldgb type access.
There is no perfcounter support hooked up yet either.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12784>
Move away from using gpu_id as the primary means to identify which
adreno we are running on, as future GPUs (starting with 7c3) stop
providing a gpu_id as a new naming scheme is introduced.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12159>
The full form for ldg.a/stg.a offset is:
g[reg_address + reg_offset << (imm_shift + 2) + imm_offset << 2]
where imm_shift is in [0, 3] and imm_offset is in [0, 3]
a6xx blob was found to produce a bit simplier offset calculations
for TES/TCS shaders in GTA V:
[c002000a_03c14215] ldg.a.f32 r2.z, g[r1.y+((r2.z+1)<<2)], 3;
[c0020004_01c14609] ldg.a.f32 r1.x, g[r1.y+((r1.x+3)<<2)], 1;
Our new syntax:
stg.a.u32 g[r2.x+(r1.x+1)<<2], r5.x, 1
stg.a.u32 g[r2.x+r1.x<<4+3<<2], r5.x, 1
ldg.a.f32 r1.w, g[r1.y+(r1.w+1)<<2], 3
ldg.a.f32 r1.w, g[r1.y+r1.w<<5+2<<2], 3
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11431>
Be consistent with other usages in Vulkan and SPIR-V, and the recently
added workgroup_size field.
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>
Move everything into a struct assocated with the pipe_fence_handle, so
that the drm layer can fill in the seqn/fd fences directly.
This will give us a comvenient place to insert a util_queue_fence in the
next commit.
While we're at it, extract the uint32_t fence (previously called
'timestamp' in place, a kgsl legacy) into a struct that encapsulates
both the kernel fence and the userspace fence.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
Most of them were actually unused. The memory type (KMEM vs SMI) only
applied to very old a2xx era devices that had a small/fast stacked
memory (SMI) vs normal memory (KMEM). And the cache flags are ignored
(ie. everything is writecombine), but we can add new cache flags later
when they actually do something.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
On a6xx/a5xx there is such dependency between branchstack bitfield
and the amount of nested ifs, which could be seen with blob:
IFs BRANCHSTACK
0 0
1 1
2 2
3 2
4 3
5 3
6 4
...
59 30
60 31
61 31
62 32
63 32
64 32
Remove open-coded branchstack for a5xx compute along the way.
Fixes tests:
dEQP-VK.spirv_assembly.instruction.compute.float16.opvectorshuffle.344
dEQP-VK.spirv_assembly.instruction.graphics.float16.opvectorshuffle.344_vert
dEQP-VK.spirv_assembly.instruction.graphics.float16.opvectorshuffle.444_geom
dEQP-VK.spirv_assembly.instruction.graphics.float16.opvectorshuffle.244_tessc
dEQP-VK.spirv_assembly.instruction.graphics.float16.opvectorshuffle.344_frag
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9859>
This needs to be part of the compiler because it's the only piece that
we always have access to in all the places ir3_optimize_loop() is
called, and it's only enabled for the whole Vulkan device. Right now
it's just used for constraining vectorization, but the next commit adds
another use.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7573>
When float16 is enabled this will allow to pass a number of
float16 tests.
When A6XX_SP_FLOAT_CNTL_F16_NO_INF is set - all operations which
generate +-infinity generate +-MAX_HALF_FLOAT.
Fixes some tests from:
dEQP-VK.spirv_assembly.instruction.*.float16.*
dEQP-VK.spirv_assembly.instruction.*.float_controls.fp16.*
E.g.:
dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_1.sinh_vert
dEQP-VK.spirv_assembly.instruction.compute.float16.arithmetic_4.length
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.log_denorm_flush_to_zero_nostorage
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.log2_denorm_flush_to_zero_nostorage
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.inv_sqrt_denorm_flush_to_zero_nostorage
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9840>