There are multiple LLVM passes that very much move the
intrinsic using the descriptor outside of the loop, defeating
the entire point of creating the loop.
Defeat the optimizer by splitting the break into a separate
if-statement and putting an optimization barrier on the bool
in between.
v2: Move from a callback based system to begin/end loop.
This does not make it significantly less intrusive but
is a bit nicer with all the extra struct and callback
stubs.
v3: Deal with non-divergent values in divergent path.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2160
Fixes: 028ce52739 "radv: Add non-uniform indexing lowering."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4109>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4109>
Some hardware doesn't support subgroup shuffle, and on such hardware
it makes no sense to lower quad broadcasts to shuffle. Instead, let's
lower them to four const quad broadcasts, paired with bcsel instructions.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4147>
Once LCRA has run, we have a map from IR indices to byte offsets into
the register file, so we need to "install" these results, rewriting the
IR to use native registers and fixing up writemasks/swizzles to
substitute vectorization for adjacent registers (for LCRA, we're
modeling in terms of real vectors).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4158>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4158>
We model the machine as vector (with restrictions) to natively handle
mixed types and I/O and other goodies. We use LCRA for the heavylifting.
This commit adds only the modeling to feed into LCRA and spit LCRA
solutions back; next commit will integrate it with the IR.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4158>
We want types to be consistent throughout the IR so we don't have to
make exceptions to parse things out. These cases just got missed.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4158>
The issue was messing with liveness analysis... with Midgard we look at
the writemask to decide how the instruction behaves. Here, since our ALU
is scalar (except for subdivision which doesn't have proper writemasks
anyway) we just look at the component count directly -- either 4 for
vector instructions (essentially - for smaller loads we can replicate
manually without much burden), or 1 for scalar.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4158>
1. Implement vkCmdBindTransformFeedbackBuffersEXT,
vkCmdBeginTransformFeedbackEXT and vkCmdEndTransformFeedbackEXT.
- Not handling counter buffers yet.
2. Implement streamout emit function, mostly taken from fd6_emit.c
v2. Replace emit_pkt4 funcs with emit_regs.
v3. Don't copy the state of stream-output from tu_pipeline.
v4. Set zero to VPC_SO_CNTL/VPC_SO_BUF_CNTL in tu6_init_hw.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3942>
Mostly taken from fd6_program.c.
v2. Note that it forces to use full VS instead of binning pass VS if
there's stream output as the binning pass VS will have outputs on
other than position/psize stripped out, which is the same as freedreno.
v3. fix indentation.
v4. Use register index instead of location when setup streamout.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3942>
Define new structures for streamout buffers and state.
Most members of the state struct are taken from freedreno driver.
v2. Use IR3_MAX_SO_* and avoid using magic values.
v3. Remove the state of stream-output in tu_cmd_state and use one in
tu_pipeline and split out reset and enabled fields.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3942>
- Add one member to the existed ir3_stream_output so that we could
assign location information from nir_xfb_info, rather than defining
new struct.
- Redefine maximum of so buffers, streams and outputs, which will be
used for turnip.
- Also enable caps for transform feedback for spirv_to_nir.
v2. Remove redefined maximums and use IR3_MAX_SO_* and add
IR3_MAX_SO_STREAMS.
v3. Remove the newly added location field so that we could keep aligned
with 32 bytes. Instead we create an array mapping between the location
and consecutive index, which is GL driver is doing.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3942>
When creating an egl surface from an ANativeWindow, the window's usage
flags need to be set so that buffers are allocated properly.
Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lepton Wu <lepton@chromium.org>
This supports powering up the device (using an external tool you
provide based on your particular lab), talking over serial to wait for
the fastboot prompt, and then booting a fastboot image on a target
device.
I was previously relying on LAVA for this, but that ran afoul of
corporate policies related to the AGPL. However, LAVA wasn't doing
too much for us, given that gitlab already has a job scheduler and
tagging and runners. We were spending a lot of engineering on making
the two systems match up, when we can just have gitlab do it directly.
Lightly-reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4076>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4076>
The debian firmware package doesn't actually contain it, costing us a
minute of boot time waiting for it to show up.
Lightly-reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4076>
Knowing following:
- CMP writes to flag register the result of
applying cmod to the `src0 - src1`.
After that it stores the same value to dst.
Other instructions first store their result to
dst, and then store cmod(dst) to the flag
register.
- inst is either CMP or MOV
- inst->dst is null
- inst->src[0] overlaps with scan_inst->dst
- inst->src[1] is zero
- scan_inst wrote to a flag register
There can be three possible paths:
- scan_inst is CMP:
Considering that src0 is either 0x0 (false),
or 0xffffffff (true), and src1 is 0x0:
- If inst's cmod is NZ, we can always remove
scan_inst: NZ is invariant for false and true. This
holds even if src0 is NaN: .nz is the only cmod,
that returns true for NaN.
- .g is invariant if src0 has a UD type
- .l is invariant if src0 has a D type
- scan_inst and inst have the same cmod:
If scan_inst is anything than CMP, it already
wrote the appropriate value to the flag register.
- else:
We can change cmod of scan_inst to that of inst,
and remove inst. It is valid as long as we make
sure that no instruction uses the flag register
between scan_inst and inst.
Nine new cmod_propagation unit tests:
- cmp_cmpnz
- cmp_cmpg
- plnnz_cmpnz
- plnnz_cmpz (*)
- plnnz_sel_cmpz
- cmp_cmpg_D
- cmp_cmpg_UD (*)
- cmp_cmpl_D (*)
- cmp_cmpl_UD
(*) this would fail without changes to brw_fs_cmod_propagation.
This fixes optimisation that used to be illegal (see issue #2154)
= Before =
0: linterp.z.f0.0(8) vgrf0:F, g2:F, attr0<0>:F
1: cmp.nz.f0.0(8) null:F, vgrf0:F, 0f
= After =
0: linterp.z.f0.0(8) vgrf0:F, g2:F, attr0<0>:F
Now it is optimised as such (note change of cmod in line 0):
= Before =
0: linterp.z.f0.0(8) vgrf0:F, g2:F, attr0<0>:F
1: cmp.nz.f0.0(8) null:F, vgrf0:F, 0f
= After =
0: linterp.nz.f0.0(8) vgrf0:F, g2:F, attr0<0>:F
No shaderdb changes
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2154
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3348>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3348>
We have native FMA which works for graphics usage (unlike Midgard where
it's really reserved for compute for various reasons), let's use it.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4150>
Instead of trying to reindex all the times, just be okay with consistent
but sparse indices, then figuring out the max index is easy enough.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4150>
Same purpose as the Midgard version, but the implementation is
*dramatically* simpler thanks to our more regular IR.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4150>
We can move e v e n more code to be shared and let bi_block inherit from
pan_block, which will allow us to use the shared data flow analysis.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4150>
Now that it's all abstracted nicely with an implementation shared with
Midgard, this is pretty easy to get.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4150>
Ideally we would sync the compilers to use the same indexing scheme but
that's a lot more Midgard refactoring than I have time for right now.
This is good enough honestly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4150>