Allow to vectorize operations from a smaller bit-size into
scalar operations of a larger bit-size. This allows us to
turn 2x8-bit into a equivalent scalar 16-bit load/store.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
This generalizes the support we added for 16-bit to also handle
8-bit loads via ldunifa. The story is the same: we align the address
to 32-bit downwards and we skip any bytes that are not of interest.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
Just like with 16-bit, this mode only supports scalar access, but
we are already lowering all non 32-bit accesses to scalar.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
Since ldunif is a 32-bit instruction we need to demote these to
UBO loads, like we do for indirect indexing, with the exception
of scalar 16bit uniforms with an offset that is 32-bit aligned.
For the exception where we can use lfdunif we read a 32-bit slot
from memory where the uniform data is in the lower 16-bit and we
will read garbage in the upper 16-bit which we won't use anyway.
It should be noted that by using ldunif, we are consuming
32-bit from the uniform stream, but this is fine because
if there is valid uniform data in the upper 16-bit (i.e.
we had a ivec2 uniform aligned to a 32-bit address), since
we scalarize 16-bit loads, we would see another load uniform
with an unaligned offset for the second component, which we
will demote to UBO.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
These are required by VK_KHR_16bit_storage. Our hardware, however,
doesn't provide any mechanism to decide on the rounding mode of
the conversion and it seems to be using RTE, so we implement
RTZ in software.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
If we know we have a load with a constant offset, then even if it
is not aligned to 32-bit we can still produce an aligned offset
and then skip over the bytes we don't need.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
Even though ldunifa is strictly 32-bit we may be able to use it
to load 16-bit values that sit at 32-bit aligned addresses.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
The vectorization pass can inject 32_2x16 (un)packing opcodes
upon successful vectorization of 16-bit operations into 32-bit
counterparts, so make sure we lower these to something our
backend can handle.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
This allows us to implement 16-bit access on uniform and
storage buffers.
Notice that V3D hardware can only do general access on scalar
16-bit elements, which we currently enforce by running a lowering
pass during shader compile.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
V3D hardware doesn't support vector access for general TMU load/store
operations like the ones we use for UBO and SSBO, so we need to split
these to scalar operations.
It should be noted that we also have a vectorization pass (which runs
later, during optimization), that may reconstruct some of these into
32-bit operations when possible (i.e. when the resulting operation
is 32-bit aligned).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
This saves a lot of pointless gl.h includes across the board,
it moves the one place that needs GLenum into a separate file
only used in those passes that require it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>
They are being used on integer to integer stores. From Vulkan sec,
final paragraph of 16.4.4 "Texel Output Format Conversion":
"Each component is converted based on its type and size (as
defined in the Format Definition section for each
VkFormat). ... Integer outputs are converted such that their value
is preserved. The converted value of any integer that cannot be
represented in the target format is undefined."
I didn't find a equivalent quote for OpenGL as all conversion entries
are forcused on float to integer, fixed-point to integer, etc, and not
on integer to integer. Didn't find any test failure with this change.
We didn't get any shader-db stats change with shaderdb (even
overriding to OpenGL 4.4 to get more shaders built), so as a reference
Vulkan shader-db stats with the pattern
dEQP-VK.image.*.with_format.*.*
total instructions in shared programs: 37534 -> 36522 (-2.70%)
instructions in affected programs: 12080 -> 11068 (-8.38%)
helped: 241
HURT: 0
Instructions are helped.
total uniforms in shared programs: 9100 -> 8550 (-6.04%)
uniforms in affected programs: 3004 -> 2454 (-18.31%)
helped: 229
HURT: 0
total max-temps in shared programs: 6110 -> 6014 (-1.57%)
max-temps in affected programs: 402 -> 306 (-23.88%)
helped: 43
HURT: 0
Max-temps are helped.
total nops in shared programs: 1523 -> 1526 (0.20%)
nops in affected programs: 21 -> 24 (14.29%)
helped: 3
HURT: 6
Inconclusive result (value mean confidence interval includes 0).
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14194>
Instead of stopping the merge process when we find an instruction
with an incompatible signal (such as an small immediate), keep
going and see if we can merge the thrsw in a previous instruction
that is compatible.
total instructions in shared programs: 13409835 -> 13356648 (-0.40%)
instructions in affected programs: 3556860 -> 3503673 (-1.50%)
helped: 17457
HURT: 18
Instructions are helped.
total max-temps in shared programs: 2353971 -> 2352956 (-0.04%)
max-temps in affected programs: 13960 -> 12945 (-7.27%)
helped: 703
HURT: 0
Max-temps are helped.
total spills in shared programs: 12301 -> 12301 (0.00%)
total sfu-stalls in shared programs: 32596 -> 32499 (-0.30%)
sfu-stalls in affected programs: 225 -> 128 (-43.11%)
helped: 79
HURT: 3
Sfu-stalls are helped.
total nops in shared programs: 347204 -> 325234 (-6.33%)
nops in affected programs: 99834 -> 77864 (-22.01%)
helped: 11515
HURT: 158
Nops are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14172>
In the V3D driver there is a NIR lowering step for `image_store`
intrinsic, where the image store format is required for doing the proper
lowering.
Thus, let's define it for the download FS instead of
keeping it as NONE.
v2 (Illia)
- Use format only for drivers not supporting format-less writing.
v4 (Illia):
- Use PIPE_CAP_IMAGE_STORE_FORMATTED to reduce combinations.
v5 (Ilia):
- Use indirect array for download FS in not formatless-store support
drivers.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13409>
In some cases we need to make the shaders write the Z value produced
from rasterization (FEP). Track these instances because they are relevant
to early EZ setup.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14037>
Even although the option is called shaderdb, it is not really used by
shaderdb (for V3D shaderdb uses the debug option "precompile"). And in
fact, right now the output format is not compatible with shaderdb.
This commit tries to fix that, and as we are here, also try to make
the option more useful for the Vulkan case, as that debug option also
works with v3dv.
We can't really fully imitate shaderdb use with OpenGL (run with a set
of glsl shader tests), but we can at least assign a unique name (the
pipeline sha1 in text format) so we can compare executions of the same
vulkan application. For that remember to disable the on-disk cache.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13938>
If we did, we would have the instruction coming right after ldvary write
to the same implicit destination as ldvary at the same time. We prevent
this when merging instructions, but we should make sure we prevent this
when we move ldvary around for pipelining too.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13921>
According to the spec the hardware locks the scoreboard on the first
or last thread switch (selected via shader state) and any TLB accesses
executed before this are not synchronized by hardware.
This change updates the logic to ensure we respect this requirement
and that we don't assume that the lock is acquired automatically
on the first TLB access, which is not valid at least since V3D 4.1+.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13910>
Writes to physical registers are not allowed after thread end. We
were checking this for ALU writes, but we need to check it for
signal writes too.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13910>
This solves a case where a NIR geometry shader was storing the output in
a non-constant:
vec4 32 ssa_1 = load_const (0xc0800000 /* -4.000000 */, 0xc1100000 /* -9.000000 */, 0x40400000 /* 3.000000 */, 0x40e00000 /* 7.000000 */)
vec1 32 ssa_7 = load_const (0x00000000 /* 0.000000 */)
vec1 32 ssa_8 = load_const (0x00000001 /* 0.000000 */)
vec1 32 ssa_9 = iadd ssa_7, ssa_8
vec1 32 ssa_19 = mov ssa_1.x
intrinsic store_output (ssa_19, ssa_9) (1, 1, 0, 160, 288) /* base=1 */ /* wrmask=x */ /* component=0 */ /* src_type=float32 */ /* location=32 slots=2 gs_streams(x=0 y=0 z=0 w=0) */
When lowering the VPM output we check if the destination (ssa_9 in this
case) is a constant to add to the VPM offset. We run a constant folding
optimization in an earlier VS lowering, and we should do the same for
GS.
This fixes multiple dEQP-VK.pipeline.interface_matching.* failures.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13884>
While fragment and geometry shader were handling structs as inputs, they
weren't doing for it arrays of structures.
This fixes multiple dEQP-VK.pipeline.interface_matching.* failures and
assertions.
v2:
- Fix style (Iago).
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13884>
When early fragment tests are mandated by the shader, we must use
the Z value produced by the FEP even if there are elements that
would typically require late fragment tests (such as discards,
sample to coverage, etc).
This change means we also need to be a bit more careful when
we promote shaders to use early fragment tests so we don't
promote anything with discards for example.
Fixes:
dEQP-VK.fragment_operations.early_fragment.discard_early_fragment_tests_depth
dEQP-VK.fragment_operations.early_fragment.discard_early_fragment_tests_stencil
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13837>
We had been storing pointers to a driver owned swizzle table
rather than storing the actual swizzle value in various shader
and pipeline keys on both GL and Vulkan drivers.
This doesn't look very robust, particularly since we also
compute sha1 hashes from these values and we may store these
hashes to disk (for the disk cache).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13738>
Typically, optimization passes go through all the blocks in a shader
and make adjustments on the fly, so we always want them to update
the current block or the current block pointer will become outdated.
Also, we don't need to keep track of the previous current block
pointer to restore it, since optimization passes run after we have
completed conversion to VIR, and therefore, anything that comes after
that should always set the current block before emitting code.
Fixes debug assert crashes when running shader-db:
vir.c:1888: try_opt_ldunif: Assertion `found || &c->cur_block->instructions == c->cursor.link' failed
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13625>
This was not quite correct in that our checks for the allowed cases
were not checking that there were no other peripheral access other
than the ones allowed.
For example, we allowed wrtmuc signal and TMU write other than
TMUC, and we also allowed TMU read and VPM read/write. But we
cannot allow wrtmuc with TMU write other than TMUC and at the
same time a VPM write for example, so we can't just check if we
have a combination of allowed peripherals, we still need to check
that those are the only ones in use by the combined instructions.
Another example is that even if we allow a TMU write (other than TMUC)
with a wrtmuc signal, the resulting instruction must still have just
one TMU write other than TMUC, but we were allowing the merge if one
instruction signaled wrtmuc and the other wrote to tmu other than tmuc
without testing if the combined result would have 2 tmu writes.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13527>
This was not considering the possibility that the driver has called
nir_before_block() or nir_after_block() to update the cursor, in which
case the cursor link points to the instruction list header and not
to an actual instruction.
Fixes incorrect debug-assert crash in:
dEQP-VK.graphicsfuzz.cov-increment-vector-component-with-matrix-copy
Fixes: 265515fa62 ("broadcom/compiler: check instruction belongs to current block")
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13467>
A TSY barrier becomes effective at the point of the next thread switch,
so if we have one coming after a previous thread switch we need to
be careful not to emit it in its delay slots, or we would be effectively
moving the barrier earlier than intended.
Fixes simulator assert crash in:
dEQP-VK.graphicsfuzz.two-for-loops-with-barrier-function
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13468>
It is really small, and used just twice, so we just call qpu_magic.
We also update how it is used:
* QFILE_NULL is an undef so we can just load anything. Previously we
were using accumulator 0, but there isn't any real reason to use
an accumulator for this. Using reg 0.
* QFILE_LOAD_IMM: it seems that we don't use at all right now, so
let's add an assert
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13008>
As we don't know if we are going to have spilling or not, emit always a
last thrsw at the end of the shader.
If later we don't have spillings and we don't need that last thrsw, we
remove it and switch back to the previous one.
This way we ensure all the spilling happens always before the last
thrsw.
v2 (Juan):
- Rework the code to force a last thrsw and remove later if no spilling
v3:
- Merge functionality inside vir_emit_last_thrsw (Iago)
- Add vir_restore_last_thrsw (Juan)
v4 (Iago):
- Fix/add new comments
- Rename variables/parameters
v5 (Iago):
- Fix comments
- Add assertion
Cc: mesa-stable
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4760
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12322>
This function is only used in v3d_nir_to_vir(), so make it private.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12322>
We had an optimization to auto-enable early fragment tests when a shader
didn't have side effects, but of course, we cannot do that this if the
shader writes Z, as in that case the fragment tests need to use the
value written from the shader.
Also, if the shader enables early fragment tests, then any shader Z
writes should be ignored.
Fixes:
dEQP-VK.spirv_assembly.instruction.graphics.early_fragment.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12736>
Fix defect reported by Coverity Scan.
Same on both sides (CONSTANT_EXPRESSION_RESULT)
pointless_expression: The expression inst->qpu.flags.auf !=
V3D_QPU_UF_NONE || inst->qpu.flags.auf != V3D_QPU_UF_NONE does not
accomplish anything because it evaluates to either of its identical
operands, inst->qpu.flags.auf != V3D_QPU_UF_NONE.
Fixes: 3f2c54a27f ("broadcom/compiler: rewrite partial update liveness tracking")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12385>