No shader-db or fossil-db changes on any Intel platform.
v2: Simlify the logic for when to try constant folding. Do
commute_immediates before constant folding. Both suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31729>
Prior to Cherryview, align16 3src instruction sources had to have their
subregister number be DWord-aligned. Cherryview added a discontiguous
bit in the encoding to represent bit 1 of the subregister number. This
allows us to use packed HF sources.
Update the ISA encoding helpers to properly handle bit 1. While we're
at it, make them take a full subregister number and adjust accordingly,
rather than making the callers divide or multiply by some alignment.
Note that the destination subregister must still be DWord aligned, so
HF destinations must be strided.
Thanks to Ian Romanick for discovering that we were botching this.
BSpec: 12054, 12081
v2 (idr): Fix ordering of high and low bit parameters to brw_inst_bits.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31834>
Gfx11+ doesn't support Align16 instructions anymore - only Align1 mode.
Bailing early for Align16 is important so that brw_hw_decode_inst
doesn't try to read Align16 related instruction fields on generations
where they no longer exist (which could trigger assertions).
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31834>
This reduces the penalty the heuristic gives to SIMD32 shaders
relative to SIMD16 in presence of discard control flow on Xe2+. The
penalty was meant to account for the inefficient divergence behavior
of SIMD32 shaders on Gfx12.x platforms, since Gfx12 hardware had EUs
bundled in groups of two, and each pair shared control flow logic so
both EUs could only execute instructions in lockstep, which meant that
SIMD32 shaders had an effective warp size of 64 on Gfx12.x.
This change switches back to more optimistic modelling of discard
divergence. With it we gain about 6% performance in a Shadow of the
Tomb Raider trace (tested on BMG).
One may wonder if there are still workloads that would suffer
materially from enabling SIMD32 for all pixel shaders on Xe2 instead
of using this heuristic, since Xe2 EUs have twice the GRF space, twice
the FPU throughput and better divergence behavior than Xe, but the
answer seems to be yes unfortunately: E.g. Superposition has some
pixel shaders where SIMD32 has substantially worse scheduling due to
the increased number of false dependencies due to higher register
pressure, and using SIMD32 for them reduces performance significantly.
The heuristic seems to model this correctly so it doesn't look like we
can do without it at least right now on Xe2.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31697>
We weren't allowing immediates in BFE at all. Gfx12+ supports
immediates in src0 (value) and src2 (width), but not src1 (offset).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31437>
HSD 14021541470 lists a HW bug on blitter engine where the compression pairing bit is
not programmed correctly for stencil resources.
Use RCS Engine to perform copy instead.
Signed-off-by: Aditya Swarup <aditya.swarup@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31792>
Follow up of the work performed in decode to
support inline query.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Signed-off-by: Stéphane Cerveau <scerveau@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31765>
Should only impact Xe2+
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 02294961ee ("anv: stop using a binding table entry for gl_NumWorkgroups")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31799>
We're not using a binding table entry anymore for num_workgroups.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 02294961ee ("anv: stop using a binding table entry for gl_NumWorkgroups")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31799>
Fixes: 4aa3b2d ('anv: LNL+ doesn't need the special flush for sparse')
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31737>
Emit dummy VF_STATISTICS state before each VF state.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31759>
This recommended values should improve the performance of async
compute in gfx20, we may want to tweek this for Linux but at least
this values should give us a better baseline than default values.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30796>
Setting the missing registers to specification recommended values that
is also the default value, so it is not expected any changes in
behavior or performance here.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30796>
DG2 has the 'Force Non-Coherent' fields but MTL and ARL has
'Z Async Throttle settings', so here adding the missing one.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30796>
At the point we were calling this, we hadn't necessarily cleaned up
derefs via nir_lower_vars_to_ssa, nor movs/vecs via copy propagation,
so it wasn't necessarily easy for this pass to see the actual usage of
the destination.
Moving this later allows us to detect f2f32(txf(...)) and avoid
converting it to a 16-bit txf (why convert with ALU instructions
when the sampler could do it for us?).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31750>
This also means the infrastructure added by @gallo in 1dc64d0613
("ci: Use merge-skips files during merge pipelines") can be used and all
the manual adding of these files can be dropped, reducing the likeliness
of bugs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31739>
Add opcodes for VOTE_ALL, VOTE_ANY and VOTE_EQUAL. The first two
are also used for the quad variants. Move their lowering from
NIR conversion to brw_lower_subgroup_ops.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31029>
If we have a non-register-aligned source, MOV it to a new register
so that the invariant expected when generating SHADER_OPCODE_BROADCAST
is respected.
Added to ensure a later patch won't hit the `src.subnr == 0` assertion
in brw_broadcast() generation code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31029>
Include in the helper which already take care of using exec_all() and
taking the first component of the result. Both are expected by
SHADER_OPCODE_BROADCAST.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31029>
ANGLE has a waiver for certain XFB tests, but this wasn't properly applied
on Alder Lake and these tests weren't skipped there.
Add a global angle-skips.txt file so that we don't have to keep copy-pasting
these skips.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31721>
There is a fresher device type with a CML GPU, with also a bigger number of
boards. Those are more reliable, so also we can remove the manual rules.
Signed-off-by: Sergi Blanch Torne <sergi.blanch.torne@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26830>
ANGLE currently pulls absolutely loads of stuff that we don't need. Fix
it up so we don't need to do that anymore, so it's much faster to build.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31716>