No shader-db changes on any Intel platform.
fossil-db:
All Ice Lake and newer platforms had similar results. (Ice Lake)
Totals:
Instrs: 165513303 -> 165511820 (-0.00%)
Cycles: 15125314947 -> 15125211500 (-0.00%); split: -0.00%, +0.00%
Totals from 82 (0.01% of 656120) affected shaders:
Instrs: 544627 -> 543144 (-0.27%)
Cycles: 22616493 -> 22513046 (-0.46%); split: -0.46%, +0.00%
No fossil-db changes on Gfx9.
Suggested-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
This adds optimizations for iadd, fadd, and ixor with reduce,
inclusive scan, and exclusive scan.
NOTE: The fadd and ixor optimizations had no shader-db or fossil-db
changes on any Intel platform.
NOTE 2: This change "fixes" arb_compute_variable_group_size-local-size
and base-local-size.shader_test on DG2 and MTL. This is just changing
the code path taken to not use whatever path was not working properly
before.
This is a subset of the things optimized by ACO. See also
https://gitlab.freedesktop.org/mesa/mesa/-/issues/3731#note_682802. The
min, max, iand, and ior exclusive_scan optimizations are not
implemented.
Broadwell on shader-db is not happy. I have not investigated.
v2: Silence some warnings about discarding const.
v3: Rename mbcnt to count_active_invocations. Add a big comment
explaining the differences between the two paths. Suggested by Rhys.
shader-db:
All Gfx9 and newer platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20300384 -> 20299545 (<.01%)
instructions in affected programs: 19167 -> 18328 (-4.38%)
helped: 35 / HURT: 0
total cycles in shared programs: 842809750 -> 842766381 (<.01%)
cycles in affected programs: 2160249 -> 2116880 (-2.01%)
helped: 33 / HURT: 2
total spills in shared programs: 4632 -> 4626 (-0.13%)
spills in affected programs: 206 -> 200 (-2.91%)
helped: 3 / HURT: 0
total fills in shared programs: 5594 -> 5581 (-0.23%)
fills in affected programs: 664 -> 651 (-1.96%)
helped: 3 / HURT: 1
fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165551893 -> 165513303 (-0.02%)
Cycles: 15132539132 -> 15125314947 (-0.05%); split: -0.05%, +0.00%
Spill count: 45258 -> 45204 (-0.12%)
Fill count: 74286 -> 74157 (-0.17%)
Scratch Memory Size: 2467840 -> 2451456 (-0.66%)
Totals from 712 (0.11% of 656120) affected shaders:
Instrs: 598931 -> 560341 (-6.44%)
Cycles: 184650167 -> 177425982 (-3.91%); split: -3.95%, +0.04%
Spill count: 983 -> 929 (-5.49%)
Fill count: 2274 -> 2145 (-5.67%)
Scratch Memory Size: 52224 -> 35840 (-31.37%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
This doesn't help very much now. A later commit adds a NIR optimization
pass, tentatively called nir_opt_uniform_subgroup, that converts many
kinds of subgroup operations to things involving
bitCount(ballot(true)). This commit makes a huge difference in the
results of that later commit.
No shader-db changes on any Intel platform.
Fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165558033 -> 165557519 (-0.00%)
Cycles: 15156188362 -> 15156178922 (-0.00%); split: -0.00%, +0.00%
Totals from 299 (0.05% of 656117) affected shaders:
Instrs: 88293 -> 87779 (-0.58%)
Cycles: 3709498 -> 3700058 (-0.25%); split: -0.28%, +0.03%
v2: Rebase on splitting ELK from BRW. Remove devinfo->ver >= 8 check.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
Otherwise the compiler generates an extra MOV to load the constant into
a register first because reasons. 🤷 vote_any, vote_all, vote_ieq,
and vote_feq handling already do this.
No shader-db changes on any Intel plaform.
Fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165592451 -> 165557937 (-0.02%)
Cycles: 15133282615 -> 15133059360 (-0.00%); split: -0.00%, +0.00%
Totals from 33779 (5.15% of 656115) affected shaders:
Instrs: 4396576 -> 4362062 (-0.79%)
Cycles: 86867412 -> 86644157 (-0.26%); split: -0.37%, +0.11%
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
The problem seems to have been related to
nir_intrinsic_load_global_block_intel being marked as non-divergent.
No shader-db or fossil-db changes on any Intel platform.
v2: Rebase on splitting ELK from BRW. Remove devinfo->ver >= 8 check.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
This is divergent because it specifically loads sequential values into
successive SIMD lanes.
No shader-db or fossil-db changes on any Intel platform.
Fixes: 9f44a26462 ("nir/divergence: handle load_global_block_intel")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
This is almost a 1:1 copy of the same function in libwayland. If the version
with the symbol propagates far enough the fallback can be removed again.
Signed-off-by: Sebastian Wick <sebastian.wick@redhat.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27511>
The fast-clear value is now the same for SNORM and UNORM, so our trick
of reinterpreting SNORM as UNORM when copying now works with UBWC. We
can also freely reinterpret UNORM, SNORM, and UINT formats, as tested by
dEQP-VK.image.mutable.*.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27506>
We used to loop like this:
foreach (config_supported_by_driver) {
foreach (config_wl_knows_about) {
dri2_add_config(wl_config.rgba_masks, wl_config.rgba_shifts)) {
if (wl_config.rgba_masks != driver_config.rgba_masks ||
wl_config.rgba_shifts != driver_config.rgba_shifts) {
return NULL; /* driver config != wl config */
}
}
}
}
This is a pretty painful way to discover the relationship between the
different sets of configs, especially as we can just look up our Wayland
visual entry directly.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27709>
This extension has been broken ever since the initial commit. It created
an XRGB DRIImage for the driver to render to, so whilst the presentation
was opaque, the buffer also completely lacked an alpha channel.
Fix it by making sure we only modify the FourCC we send to the Wayland
server when creating a buffer.
Closes: mesa/mesa#5886
Fixes: 9aee7855d2 ("egl: implement EGL_EXT_present_opaque on wayland")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27709>
This entire pattern really wants to be a shared helper, to allocate a
shadow linear image from another device and then import it across into
the rendering GPU. Querying the FourCC from the DRIImage makes it easier
to pull out into shared code.
This temporarily makes the implementation more ugly, however it's
already pretty hard on the eyes, so probably no great loss.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27709>