The unordered mode stored in dependencies might be a bitmask and not
only a single mode. In practice, only the "stronger" mode will stick.
Make sure that the code testing for the mode uses "&" instead of "==",
to avoid prevent some valid combinations to happen, e.g.
```
// ...
add(16) g104<1>F g94<1,1,0>F g34<1,1,0>F { align1 1H @7 $7.dst compacted };
```
which without the fix ends up as
```
// ...
sync nop(1) null<0,1,0>UB { align1 WE_all 1N F@7 };
add(16) g104<1>F g94<1,1,0>F g34<1,1,0>F { align1 1H $7.dst compacted };
```
Enables two tests for the scoreboard pass that illustrate this case.
For measuring the effect, re-enabled the sync.nop accounting on total of
instructions and got the following results.
```
Totals:
Instrs: 322041261 -> 321748285 (-0.09%)
Cycle count: 22864587567 -> 22863073741 (-0.01%)
Max dispatch width: 7989040 -> 7989024 (-0.00%); split: +0.00%, -0.00%
Totals from 88212 (9.78% of 902056) affected shaders:
Instrs: 102282050 -> 101989074 (-0.29%)
Cycle count: 12787629859 -> 12786116033 (-0.01%)
Max dispatch width: 525336 -> 525320 (-0.00%); split: +0.01%, -0.01%
```
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36096>
Add support for disabling the VRT (Variable Register Thread) feature.
The strategy here is to force the old BRW_MAX_GRF limit for the
register allocator (locks the upper limit) and make sure
ptl_register_blocks() always return that amount of blocks (locks
the lower limit).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35781>
This needs to be called before intel_nir_opt_peephole_ffma, so I
arbitrarilly decided to call it right before.
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17120227 -> 17118227 (-0.01%)
instructions in affected programs: 5854 -> 3854 (-34.16%)
helped: 51 / HURT: 0
total cycles in shared programs: 895497762 -> 894733940 (-0.09%)
cycles in affected programs: 4603518 -> 3839696 (-16.59%)
helped: 95 / HURT: 21
LOST: 1
GAINED: 0
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35925>
The conversion from bit value to register file type is already done
by the brw_eu_inst_3src_a1_dst_reg_file in the FFC macro now, so doing it
again produced incorrect results.
Fixes: e7179232 ("intel/brw: Move encoding of Gfx11 3-src inside the inst helpers")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13141
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35960>
Similar to nir_lower_alu_width(), the callback can return the
desired number of components for a phi, or 0 for no lowering.
The previous behavior of nir_lower_phis_to_scalar() with lower_all=true
can be elicited via nir_lower_all_phis_to_scalar() while the previous
behavior with lower_all=false now corresponds to nir_lower_phis_to_scalar()
with NULL callback.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35783>
The GLSL compiler always lowers inputs to temps for VS and GS, so exclude
them from driver support because the GLSL compiler will no longer do that
unconditionally. Thus, indirect VS and GS inputs are completely untested
and broken in a lot of drivers.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35945>
We simplify the implementation by assuming the worse case, copying
entire per-vertex regions if necessary.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
The formula uses scalar indices (4bytes), not slots (16bytes).
We also incorrectly passed a scalar (vertex case) & slot (mesh case)
offset in the push constants. Use slots instead so that the value is
smaller and we can pack more stuff into fs_msaa_flags.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 18bbcf9a63 ("intel: introduce new VUE layout for separate compiled shader with mesh")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
load intrinsics don't have range
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 18bbcf9a63 ("intel: introduce new VUE layout for separate compiled shader with mesh")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
Consider an innocuous instruction like:
and(1) v250:UD, g0.3<0,1,0>:UD, 4294967264u NoMask group0
If register allocation decides to spill v250, it will see this
instruction and say, "Oh no! The other components of v250 aren't set, so
I'd better add a fill before that instruction!"
But it gets even worse than that... if register coalesce decided to
merge two of these, the live range gets massively extended because the
writes don't fully initialize the value. This causes the need to spill
these registers in the first place.
Changing that instruction to SIMD16 on Xe2 or SIMD8 on other platforms
alleviates these issues.
shader-db:
Lunar Lake
total instructions in shared programs: 17118324 -> 17113191 (-0.03%)
instructions in affected programs: 93701 -> 88568 (-5.48%)
helped: 42 / HURT: 6
total cycles in shared programs: 895422566 -> 895079488 (-0.04%)
cycles in affected programs: 30111338 -> 29768260 (-1.14%)
helped: 35 / HURT: 40
total spills in shared programs: 3588 -> 3304 (-7.92%)
spills in affected programs: 285 -> 1 (-99.65%)
helped: 10 / HURT: 0
total fills in shared programs: 2218 -> 1663 (-25.02%)
fills in affected programs: 556 -> 1 (-99.82%)
helped: 10 / HURT: 0
Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown)
total instructions in shared programs: 20059218 -> 20053563 (-0.03%)
instructions in affected programs: 96938 -> 91283 (-5.83%)
helped: 43 / HURT: 6
total cycles in shared programs: 884174588 -> 883536475 (-0.07%)
cycles in affected programs: 22105268 -> 21467155 (-2.89%)
helped: 35 / HURT: 27
total spills in shared programs: 5032 -> 4679 (-7.02%)
spills in affected programs: 355 -> 2 (-99.44%)
helped: 12 / HURT: 0
total fills in shared programs: 4782 -> 4113 (-13.99%)
fills in affected programs: 671 -> 2 (-99.70%)
helped: 12 / HURT: 0
Skylake
total instructions in shared programs: 19097658 -> 19097665 (<.01%)
instructions in affected programs: 14202 -> 14209 (0.05%)
helped: 0 / HURT: 5
total cycles in shared programs: 862058109 -> 862058267 (<.01%)
cycles in affected programs: 3450244 -> 3450402 (<.01%)
helped: 7 / HURT: 11
fossil-db:
Lunar Lake
Totals:
Cycle count: 31439652246 -> 31439652272 (+0.00%)
Totals from 2 (0.00% of 707091) affected shaders:
Cycle count: 2602 -> 2628 (+1.00%)
No other Intel platforms had any fossil-db changes.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35721>
`getopt_long()` returns an `int`, not a `char`; putting the value in
a `char` before comparing it to `-1` was making the comparison always
fail, resulting in the invalid codepath taken that then fails with:
option `-' is invalid: ignored
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34756>
`subprocess.Popen()` returns immediately, and the subprocess might not
have finished by the time `stdout` is read on the next line, spuriously
failing the tests.
`subprocess.check_output()` makes sure the output is available before
returning, solving this issue; it additionally raises an error if the
subprocess failed, giving a better error than a failed diff later in the
script.
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34756>
e4m3fn: 8bit floating point format with 4bit exponent, 3bit mantissa
and no infinities (finite only)
e5m2: 8bit floating point format with 5bit exponent, 2bit mantissa
and with infinities.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434>
This will later be encoded by the backend into the
LSC extended descriptor message.
Reworks:
* Sagar: Add nir_intrinsic_ssbo_atomic_swap
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35252>
We're not currently using RANGE_BASE and we'll use BASE for offset
optimizations on Xe2+.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35252>
Given that the intrinsic will be CSEed at the NIR level, we don't need to
preemptively set it up at the top of the shader. No change in HSW shader-db.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>
Unless I've seriously missed something, we have the Z in the payload
(which we can always request if we need access to it and it's not already
passed to us due other WM IZ settings).
total instructions in shared programs: 4408303 -> 4408186 (<.01%)
instructions in affected programs: 1164 -> 1047 (-10.05%)
total cycles in shared programs: 142485036 -> 142484566 (<.01%)
cycles in affected programs: 26820 -> 26350 (-1.75%)
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>
NIR catches that if you're just doing something like adding two smooth
inputs, we can do the multiply once on the result instead of on each
input. BRW shader-db results:
total instructions in shared programs: 4409146 -> 4408303 (-0.02%)
instructions in affected programs: 800761 -> 799918 (-0.11%)
total cycles in shared programs: 143203198 -> 142485036 (-0.50%)
cycles in affected programs: 79081682 -> 78363520 (-0.91%)
total sends in shared programs: 363044 -> 363042 (<.01%)
sends in affected programs: 33 -> 31 (-6.06%)
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>