When LDP uses a negative offset (which it valid), since
`struct ir3_register` uses `{i,u}nt32_t` for the immediate
values, using `extract_reg_uim()` wasn't sign extending
negative immediate values.
Addresses:
```
src/freedreno/isa/encode.h:84:
pack_field: Assertion '!(( val & ~BITFIELD64_MASK(1 + high - low)) &&
(~val & ~BITFIELD64_MASK(1 + high - low)))' failed.
```
seen in https://gitlab.freedesktop.org/mesa/mesa/-/issues/11153 .
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29768>
It prevents the hazard when in the following case:
ldc.1.k.imm c[a1.x], 0, 1
(ss)mova1 a1.x, 8
The correct way is:
ldc.1.k.imm c[a1.x], 0, 1
(ss)mova1 a1.x, (r)8
Without it ldc may use a1.x which is set after ldc.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27462>
isam.v is a version of isam that can load multiple components from IBOs.
It uses some bits that are used for different purposes in other tex
instructions:
- bit 50 (.v): .s elsewhere
- bit 53 (indicates whether an immediate offset is used): .p elsewhere
- bit 18 (.1d when not set, has to be set for .v): 0 elsewhere
For this reason, the bitset hierarchy for cat5 had to be reordered a
bit.
The immediate offset is encoded as an extra (immed) source register and
an instruction flag (to be able to make the distinction between offset
zero and no offset, although this might not be useful).
This also adds a flag for the .1d field. Since this bit is active-low,
this flag has inverted semantics: setting it will make .1d inactive.
Note that some existing disassembler tests for isam had to be updated
because the bit is never set and this is now disassembled as .1d. This
matches the blob's disassembler.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28664>
We used to model predt/predf as taking a predicate register source. The
blob disassembler shows them taking a label argument. However, it seems
that both are incorrect: the condition is always taken from p0.x and I
have not been able to construct a test case were the label makes any
difference.
This patch changes predt/predf to not take any arguments and adds
documentation about how predicated execution works.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27982>
This will switch everyone to the isa specific functions.
Fixes the output of etnaviv's pre_instr_cb callback if
freedreno and etnaviv are build at the same time.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28176>
Create a static library that just contains isa_print(..). We
need to do this step to make lto happy.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28176>
Any component that links against libir3decode should not need to
take care if the generated isa files exists.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28176>
We currently have a bit of a confusing situation where we have both
opcodes for the different branches (OPC_BR, OPC_BRAA,...) and branch
types which are supposed to be used with OPC_B (BRANCH_PLAIN,
BRANCH_AND,...). However, not every kind of branch has a corresponding
type. For example, getone is represented by OPC_GETONE instead of a
branch type.
This patch proposes to get rid of the branch types and use opcodes
everywhere. I think this makes the representation of branches more
consistent. It also removes the for the encoder to translate branch
types into opcodes.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27411>
_Presumably_ invalidates workgroup-wide cache for image/buffer data access.
so while "fence" is enough to synchronize data access inside a workgroup,
for cross-workgroup synchronization we have to invalidate that cache.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23217>
Though blob is not seen to even use mode1 on a740, it uses
S2EN variant instead.
Fixes:
dEQP-VK.binding_model.descriptor_buffer.multiple.*
dEQP-VK.binding_model.descriptor_buffer.embedded_imm_samplers.*
dEQP-VK.pipeline.monolithic.descriptor_limits.compute_shader.*
Adapted from Jonathan Marek's changes.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23217>
A new attribute on source GPRs reflecting if a certain usage of a
value is the last usage of it was added in A7xx. This is seemingly
a performance hint and doesn't affect anything when not applied.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21498>
STore Shared Const - loads SIZE dwords from HLSQ_SHARED_CONSTS_IMM
starting from HLSQ_SHARED_CONSTS_IMM[SRC] and writing them to c[DST]
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21498>
The new stg.a/ldg.a addressing form supersedes the a6xx's one.
The new form is:
ldg.a.f32 r4.y, g[c0.z+r4.y+2], 4
There are no shift comparing to the a6xx:
ldg.a.f32 r4.y, g[r0.z+(r4.y)<<2], 4
Also on a7xx the first src is allowed to be both const and gpr.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21498>
[00000001x_00000000x] nop ; dontcare bits in nop: 0000000100000000
[00000002x_00000000x] nop ; dontcare bits in nop: 0000000200000000
Doesn't seem to make them different from ordinary nops.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21498>
Has short and long variants, long seem to be ~20 times longer.
The exact difference between it and a bunch of nops is unknown.
The emission of this instruction were not observed in the wild.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14419>
- tcinv - Likely Texture Cache Invalidate (unverified)
- icinv - Mostly sure that it is Instruction Cache Invalidate
- dccln - Data Cache Clean
- dcinv - Data Cache Invalidate
- dcflu - Data Cache Flush
The emission of these instructions were not observed in the wild.
TODO: find out the difference between .shr and .all modes of
dccln, dcinv, dcflu.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14419>
The way the isaspec decoder used to work was that it would generate a
header and a C file, each with ISA-specific stuff in it. Then that would
get built together with a stand-alone decode.c file which lives in the
isaspec folder, not the driver's folder. In order for decode.c to find
the ISA-specific headers, it would also generate a glue header which had
to be named isaspec-isa.h. This effectively meant that you can't have
multiple isaspec definitions in the same folder.
To solve this, we make do it the other way around and make the generated
header and C files include the stand-alone files. This is a bit awkward
because it means including a C file from another C file but it's better
for the build system.
Acked-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20525>
This also cleans up some of our python script execution conventions and
handles mako errors better. Copied a bit from vk_entrypoints_gen.py.
Acked-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20525>
The implementation of isa_decode(..) is already part of isaspec. So lets
move the function declaration and some related structs to a src/isaspec.
Also make the header C++ safe.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18403>
Add in the type, even though it turns out to not be that useful. Add
in support for assembling it. Add some notes based on computerator
experiments. And add support for the indirect a1.x mode that's needed
for storing c64.x and later.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
The zw were already known, but throw them in here too. I'm not extremely
happy with the description of "y", feels like there's a simpler
explanation there, but I couldn't find it.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14672>
Some bits are slightly different on a4xx. Use the encodings that work.
Perhaps these can be combined at some point if we get a proper
understanding of what they mean.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14789>
This is necessary for some ops which have slightly different encoding on
a4xx/a5xx, but are otherwise identical. This helps keeping the compiler
from having to worry about these details and creating separate ops.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14789>
* shrm - (src2 >> src1) & src3
* shlm - (src2 << src1) & src3
* shrg - (src2 >> src1) | src3
* shlg - (src2 << src1) | src3
* andg - (src2 & src1) | src3
* dp2acc - dot product of two {i,u}8vec2 packed into
SRC1 and SRC2, added to 32b SRC3
* dp4acc - dot product of two {i,u}8vec4 packed into
SRC1 and SRC2, added to 32b SRC3
* wmm - vec4(x_1, x_2, x_3, x_4) * (y_1 + y_2 + y_3 + y_4), which is
duplicated (1 << (SRC3 / 32)) times starting from DST register
* wmm.accu - same as wmm but result is added to DST registers, however
the first reg in each vec4 result is overwritten instead of
accumulating.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>
Fix defect reported by Coverity Scan.
Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
overflow_before_widen: Potentially overflowing expression 2 << W
with type int (32 bits, signed) is evaluated using 32-bit
arithmetic, and then used in a context that expects an expression
of type uint64_t (64 bits, unsigned).
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14258>
Separating atomic opcodes makes possible to express a6xx global
atomics which take iova in SRC1. They would be needed by
VK_KHR_buffer_device_address.
The change also makes easier to distiguish atomics in conditions.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8717>