We are going to disallow continue statements without
loop continue constructs.
Replaced with a test that checks that the optimization is not
applied in absense of actual work after the conditional break.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39942>
We are going to disallow continue statements without
loop continue constructs.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39942>
This was missing when this intrinsic was added.
Fix some issue with FSI lowering and probably more.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: e779538ad2 ("nir: add nvidia IO intrinsics")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40543>
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 0092edfec0 ("nir/dead_cf: Do not remove loops with loads that can't be reordered")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40543>
The old calculation depended on the sample count, and gave subpar
results for 8x MSAA with standard sample locations. The new calculation
is based on the Intel pass, with some changing of the constants so that
the sample count is always proportional to alpha for 2xMSAA and 4xMSAA
and the addition of rotating the sample mask based on the pixel.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39335>
If dxvk's opencoded fmulz gets partially constant folded,
it leaves this mess behind.
It's important to do this before the more general fmul+b2f patterns added
in the next commit, because they change the signed zero behavior in a way
that can't be optimized back.
Foz-DB Navi48:
Totals from 36 (0.03% of 114655) affected shaders:
Instrs: 16513 -> 15706 (-4.89%)
CodeSize: 99756 -> 95760 (-4.01%)
Latency: 45165 -> 44151 (-2.25%)
InvThroughput: 8344 -> 7886 (-5.49%)
VClause: 395 -> 401 (+1.52%)
Copies: 639 -> 634 (-0.78%)
PreSGPRs: 1158 -> 1154 (-0.35%)
PreVGPRs: 1227 -> 1225 (-0.16%)
VALU: 11310 -> 10769 (-4.78%)
SALU: 813 -> 809 (-0.49%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40399>
This means we respect the pattern order better because
simple replacements like bcsel(False, a, b) -> b no longer
insert movs that can block more specialized patterns.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40399>
Valhall removed Bifrost's memory segments and added in its place memory
access. Those were bolted on reserved bits as "pseudo-segments" and the
emitter would catch these and emit the right memory access. This commit
cleans it up a bit by making memory_access available directly and
exposing it to NIR (this will be useful later).
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40391>
This removes the previous hack that searched the psiz write by looking
for 16-bit stores with the correct pseudo segment. We also add a new
intrinsic that mimicks global stores but tags psiz writes, this will be
used later in the series.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40391>
First, rename them to make them a bit more clear. They act on global
memory so they should be _global and they map to ld/st_cvt so so _cvt is
nice and obvious. Second, they don't need IO semantics as they're not
IO. But they do need ACCESS so that we can better control things like
CAN_REORDER. Third, add a src_type to store_global_cvt even though it
won't be used just yet because we'll want it for lowering VS stores.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40391>
Unlike load[_interpolated]_input, which has to deal with all sorts of
ABI nonsense between driver and compiler, these new intrinsics are
dumber than bricks. They're literally just the HW ops as NIR
intrinsics. These will allow us do the lowering in NIR and put the
driver in total control over what goes down what path. Among other
things, a driver could choose to lower some things to ld_var and others
to ld_var_buf.
Co-authored-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40391>
Move lowering to nir_lower_subgroups. At some point Intel
backend might want to skip that and lower at the backend IR
boundary, but for now lowering always applies.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40376>
This comes up in lowered load_ubo sequences (observed in OpenCL test
test_api min_max_parameter_size). Hopefully the pack gets coalesced,
it's like nir_op_vec2 on most backends, so it should usually be ok to
sink even though the register pressure heuristic will reject it.
Allowing it to sink allows the UBO load to sink.
Intel's backend scheduler can optimize the relevant sequences locally
but there should still be a win here for global load sinking.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40267>
ministat (nir_analyze_fp_class):
Difference at 95.0% confidence
-201983 +/- 1064.87
-9.31575% +/- 0.0468505%
(Student's t, pooled s = 1257.67)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40346>
ministat (nir_analyze_fp_class):
Difference at 95.0% confidence
-4484.55 +/- 1288.68
-0.205419% +/- 0.0589514%
(Student's t, pooled s = 1521.99)
This should also use less memory.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40346>
This means we don't run into undefined behavior when testing nan/inf inputs.
Also make sure that patterns using is_only_used_as_float are signed zero correct.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291>
These can never come from the API but there's a few cases where panvk
wants them.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38681>