This is a per-component bit mask (0xf for each output).
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Fixes: 0e0c2574d1 ("radv: Add shader stats for inputs and outputs.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31593>
Unigine Heaven with AMD_DEBUG=mono has incorrect rendering on gfx11
because it doesn't set nir_io_semantics::dual_source_blend_index for
the second output, resulting in garbage asm.
Instead of trying to find out what's wrong, I decided to rewrite this
to make it the same as the LLVM IR path. It simplifies the code and fixes
Unigine Heaven with AMD_DEBUG=mono.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31669>
This also means the infrastructure added by @gallo in 1dc64d0613
("ci: Use merge-skips files during merge pipelines") can be used and all
the manual adding of these files can be dropped, reducing the likeliness
of bugs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31739>
Add opcodes for VOTE_ALL, VOTE_ANY and VOTE_EQUAL. The first two
are also used for the quad variants. Move their lowering from
NIR conversion to brw_lower_subgroup_ops.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31029>
If we have a non-register-aligned source, MOV it to a new register
so that the invariant expected when generating SHADER_OPCODE_BROADCAST
is respected.
Added to ensure a later patch won't hit the `src.subnr == 0` assertion
in brw_broadcast() generation code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31029>
Include in the helper which already take care of using exec_all() and
taking the first component of the result. Both are expected by
SHADER_OPCODE_BROADCAST.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31029>
ANGLE has a waiver for certain XFB tests, but this wasn't properly applied
on Alder Lake and these tests weren't skipped there.
Add a global angle-skips.txt file so that we don't have to keep copy-pasting
these skips.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31721>
There is a fresher device type with a CML GPU, with also a bigger number of
boards. Those are more reliable, so also we can remove the manual rules.
Signed-off-by: Sergi Blanch Torne <sergi.blanch.torne@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26830>
Reverse engineered with the following OpenCL kernel:
kernel void add(global float* out, float a, float b) {
float r;
_viv_asm(CLAMP0MAX, r, a, b);
out[0] = r;
}
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31674>
Switch NAK to it.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31706>
For surfaces without a modifier, the surf_size check wasn't
necessary, but it was also invalid since surf_size is set later
(in gfx12_compute_miptree).
Since it's not required anyway, drop this check.
Fixes: 060d5dacfd ("ac: add gfx12 DCC shared code")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31683>
ffs(VRAM, GTT) returns the GTT bit as it's the smaller.
Simplify the code by explicitely selecting VRAM when both
domains are active, otherwise assert that only 1 bit is set.
Fixes: 593f72aa21 ("winsys/amdgpu-radeon: rework how we describe heaps")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31683>
ANGLE currently pulls absolutely loads of stuff that we don't need. Fix
it up so we don't need to do that anymore, so it's much faster to build.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31716>
Helps for the case when max_distance is set to ~0, where the pass would now
only create groups of two loads together due to overflow. Found while
experimenting with this pass on r300, however the only driver currently
affected is i915.
With i915 this change gains around 20 shaders in my small shader-db
(most notably some GLMark2, Unigine Tropics, Tesseract, Amnesia) at
the expense of increased register pressure in few other cases.
I'm assuming this is a good deal for such old HW, and this seems like what
was intended when the pass was introduced to i915, but anyway this
could be tweaked further driver side with a more optimized max_distance
value. Only shader-db tested.
Relevant i915 shader-db stats (lpt):
total tex_indirect in shared programs: 1529 -> 1493 (-2.35%)
tex_indirect in affected programs: 96 -> 60 (-37.50%)
helped: 29
HURT: 2
total temps in shared programs: 3015 -> 3200 (6.14%)
temps in affected programs: 465 -> 650 (39.78%)
helped: 1
HURT: 91
GAINED: 20
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: GKraats <vd.kraats@hccnet.nl>
Fixes: 33b4eb149e
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31529>
Quoting the documentation :
"The returned range is a half-open interval where all of the
addresses within the subimage are < end_tile_B."
This is obviously not true with images smaller than a logical tile.
Currently the code return 1.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24276>
lower_scan_reduce only worked when ballot_components equals one. This
commit adds support for arbitrary ballot_components.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31587>
This functionality will become more complex in the next commit so
separate it into a helper function.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31587>
build_subgroup_mask and build_ballot_imm_ishl will be needed by other
functions higher-up the file.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31587>
[WHY]
- was hardcoded to store 16 configs only
- as the config descriptor usage grows, more is needed
- in bypass case, we also generate a new config which is a waste
[HOW]
- change to use vector to store configs
- don't force new config desc if in bypass
- revise the vector API, reduce the parameter passing
[TESTING]
- Tested with corresponding test cases
Reviewed-by: Brendan Leder <breleder@amd.com>
Acked-by: Chih-Wei Chien <Chih-Wei.Chien@amd.com>
Signed-off-by: Roy Chan <roy.chan@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31693>