Commit graph

4105 commits

Author SHA1 Message Date
Timur Kristóf
12652cc549 nir: Add pack_half_2x16_rtz_split opcode.
Same as pack_half_2x16_rtz_split, but always uses RTZ mode.
Note that pack_half_2x16 rounding mode is unspecified.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15838>
2023-01-26 12:24:24 +00:00
Lionel Landwerlin
ff34e96701 nir/lower_io: fix bounds checking for 64bit_bounded_global
If the offset is negative like it's the case in

dEQP-VK.robustness.robustness2.bind.notemplate.r32i.unroll.volatile.storage_buffer_dynamic.readwrite.no_fmt_qual.len_256.samples_1.1d.comp

we end up passing the bounds checking condition because it's using
signed integers.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20762>
2023-01-19 09:16:40 +00:00
Alyssa Rosenzweig
e664082d35 nir/lower_blend: No-op nir_color_mask if no mask
In this usual case, do a quick check to avoid generating 5 useless instructions
(mov/vec4 instructions). They'll get copypropped but that creates more work for
the optimizer and nir/lower_blend runs in a hot variant path on both Asahi and
Panfrost.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
2023-01-19 04:09:17 +00:00
Alyssa Rosenzweig
1fc25c8c79 nir/lower_blend: Handle undefs in stores
nir/lower_blend asserts:

   assert(nir_intrinsic_write_mask(store) ==
          nir_component_mask(store->num_components));

For the special blend shaders used in Panfrost, this holds. But for arbitrary
shaders coming out of GLSL-to-NIR (as used with Asahi), this does not hold. In
particular, after nir_opt_undef runs, undefined components can be trimmed.
Concretely, if we have the shader:

    gl_FragColor.xyz = foo;

Then this becomes in NIR

   gl_FragColor = vec4(foo.xyz, undef);

and then opt_undef will give the store_deref a wrmask of xyz but 4 components.
Then lower_blend asserts out.

Found in a gfxbench shader on asahi.

Closes: #6982
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
2023-01-19 04:09:17 +00:00
Alyssa Rosenzweig
8b83210ab3 nir/lower_blend: Don't do logic ops on pure float
Per the spec.

Fixes arb_color_buffer_float-render on both Panfrost and Asahi (before/after
reproduced on Mali-T860 and AGX G13 respectively). Without that patch, that test
fails the assertion:

arb_color_buffer_float-render: ../src/compiler/nir/nir_lower_blend.c:259: nir_blend_logicop: Assertion `util_format_is_pure_integer(format)' failed.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
2023-01-19 04:09:17 +00:00
Alyssa Rosenzweig
dbd0615e7a nir/lower_blend: Avoid useless iand with logic ops
The upper bits start correctly, there's no need to clear them as long as we keep
them zero'ed by using ixor with a valid bit mask instead of inot.

Makes the code generated for logic op slightly less ridiculous. I'm joking. It's
still ridiculous but I'm not in the mood to fix up the Midgard compiler and it's
just a little ALU for a feature almost nothing uses.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
2023-01-19 04:09:17 +00:00
Alyssa Rosenzweig
ee127f03e4 nir/lower_blend: Fix SNORM logic ops
We need to sign extend. Incidentally this means the iand above is useless for
SNORM.

Fixes arb_color_buffer_float-render with GL_RGBA8_SNORM.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
2023-01-19 04:09:17 +00:00
Alyssa Rosenzweig
f9839e7e1b nir/lower_blend: Clamp blend factors
Particularly constant colours, but also (more obscurely) SNORM.

Fixes arb_color_buffer_float-render with SNORM framebuffers. Issue affects both
Asahi and Panfrost (the latter after we start advertising EXT_render_snorm).

v2: Check the blend factor to avoid unnecessary clamps. This avoids regressing
blend shader code quality on Panfrost.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com> [v1]
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
2023-01-19 04:09:17 +00:00
Alyssa Rosenzweig
fca457790e nir/lower_blend: Fix alpha=1 for RGBX format
In this case we have 4 components but the value of the fourth component
is undefined. Apply the fixup we already have.

Fixes
dEQP-GLES3.functional.draw_buffers_indexed.random.max_implementation_draw_buffers.0
on Asahi. That test blend with DST_ALPHA with its RGB565 attachment,
which is fine if RGB565 is preserved, but Asahi is demoting that
format to RGBX8 which means -- after lowering the tilebuffer access --
we blend with an ssa_undef.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
2023-01-19 04:09:17 +00:00
Lionel Landwerlin
b82d9b1a3d nir/divergence: add missing RT intrinsinc handling
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20763>
2023-01-18 22:32:43 +00:00
Qiang Yu
49cfbe1fed nir/xfb_info: nir_gather_xfb_info_from_intrinsics update nir xfb_info
Use this function to update nir_shader->xfb_info.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19489>
2023-01-18 05:30:14 +00:00
Gert Wollny
2e05cfa179 nir: Add range_base to atomic_counter and an option to use it
Some drivers may encode constant offsets in the instruction, so
make it possible for the drivers to request lowering the atomic
uniform offset into the range_base variable of the intrinsic.

v2: drop patch to use build-in array offset evaluation, it makes
    problems with zink, and update the code accordingly
v3: always initialize range base

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19980>
2023-01-17 13:19:04 +00:00
Gert Wollny
c4cde91c1b nir: Add possibility to store image var offset in range_base
Add the intrinsic range_base value to the image intrinsics and add
the option to store the image array offset into range_base instead
of adding it to the image array index if the driver requests it.

v2: Always initialize range_base

v3: fix for bindless intrinsics

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19980>
2023-01-17 13:19:04 +00:00
Alyssa Rosenzweig
c3839bd540 nir: Optimize vendored sin/cos the same way
As we've done for the AMD one, to prevent any codegen regression from switching
the Midgard lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19350>
2023-01-16 22:20:43 +00:00
Alyssa Rosenzweig
a49ba0f1ae nir: Add Midgard-specific fsin/fcos ops
NIR has a fsin instruction that takes an argument in radians. Midgard instead
has an fsinpi argument that takes an argument in multiples of pi. So, we had a
NIR pass that would change fsin(x) to fsin(x / pi) and then map fsin to fsinpi
in the backend.

But that's invalid! In NIR, the opcode fsin is well-defined. fsin(x) means
something very different than fsin(x / pi). They won't usually be equal. The
transform fsin(x) -> fsin(x / pi) is fundamentally unsound.

It did work before, by accident. Most NIR passes don't care about the semantics
of ALU instructions.  fsin(x) and fsin(x / pi) are both well-defined but
fundamentally different NIR shaders. So while rewriting is wrong -- the NIR we
get out is not equivalent to the NIR we put in, and the Midgard ops we generate
are not equivalent to the NIR -- but if we don't run any passes that care about
the definition of fsin the two wrongs will cancel out to make a right.

However, some NIR passes do care about the definitions of ALU instructions,
instead of treating them as named black boxes. In particular, constant folding
(nir_opt_constant_fold) evaluates ALU instructions when their inputs are
constants, according to the definition in nir_opcodes.py. So our little charade
will only work if we don't call nir_opt_constant_fold, or if all the fsin
instructions have non-constant inputs. At the beginning of this series, that is
the case. With the later scalarization change, that's no longer the case, and
the unsoundness translates to real failing tests rather than a quibble of NIR's
semantics.

To mitigate, we define a new NIR opcode with the semantics we want and translate
fsin(x) = fsin_mdg(x / pi), where that equivalence does hold mathematically. So
the new translation is sound and doesn't rely on lucky pass ordering.

This matches the approach already used for AMD and AGX, which have fsin_amd and
fsin_agx opcodes respectively.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19350>
2023-01-16 22:20:43 +00:00
Jason Ekstrand
b39958a3a1 anv,nir: Move the ANV YCbCr lowering pass to common code
Nir changes:
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

Anv changes:

Acked-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19950>
2023-01-16 14:10:21 +00:00
Jason Ekstrand
f02a11e4e4 nir: Add copyright and include guards to nir_vulkan.h
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19950>
2023-01-16 14:10:21 +00:00
Jason Ekstrand
433fe592ac nir/builder: Add some texture helpers
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19480>
2023-01-13 20:25:01 +00:00
Jason Ekstrand
30f3fec380 nir: Add more opcodes to nir_tex_instr_is_query()
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19480>
2023-01-13 20:25:01 +00:00
t0b3
267dd1f4d5 nir/nir_opt_move: fix ALWAYS_INLINE compiler error
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Closes: #6825
Fixes: f1d20ec6 ("nir/nir_opt_move: handle non-SSA defs ")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17439>
2023-01-13 14:23:35 +00:00
Alyssa Rosenzweig
f4b3201244 nir/peephole_select: Allow load_preamble
load_preamble is intended to be almost free (costing at most a move), and it
does not have special bounds checking requirement, so it's ok to select with it.
With this, drivers that use nir_opt_preamble together with a late call to
peephole_select can optimize sequences like:

   if (x) {
      <uniform-on-uniform calculation>
   } else {
      <different uniform-on-uniform calculation>
   }

to simply

   bcsel(x, <uniform register 0>, <uniform register 1>)

rather than emitting needless control flow / branching over some moves.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20597>
2023-01-13 00:43:04 +00:00
Pavel Ondračka
3fcdd9e4a7 nir/lower_bool: ntt: Generate a good opcode for bcsel
This is heavily copy-pasted from a patch of Ian Romanick, including the
commit message.

Previously, this pass always generated fcsel for bcsel.  This was the
only place that generate fcsel, so various drivers assumed (and needed!)
that src0 was a Boolean with 0.0 or 1.0 as the only values.

Specifically, many DX9 / GL_ARB_vertex_program platforms lack a CMP
instruction in vertex shaders.  In those cases, they would use LRP to
implement fcsel.  The bummer is that many plaforms have a real fcsel
instruction, and those platforms would benefit from other places
generating that opcode.

Instead of leaving assumptions in drivers about the sources of an opcode
that they can't really support, allow them to control the way the
lowering pass translates bcsel.  Two flags are used to control this:

- If the driver sets has_fused_comp_and_csel in nir_options, fcsel_gt
  will be used.  Since the Boolean value is 0.0 or 1.0, this is
  equivalent to fcsel.

- If the parameter has_fcsel_ne is set, fcsel will be used.  This is the
  old path.

- Otherwise, the lowering pass assumes we're on a crufty, old DX9 vertex
  program, and it emits flrp.

With this, the assumptions about src0 of fcsel in NTT can be removed.
If a platform can't handle fcsel, it should ensure that the lowering
pass won't generate it.

No change in shader-db.

Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20162>
2023-01-12 23:01:05 +00:00
Ian Romanick
70b25d9fe8 nir/lower_int_to_float: Add support for i32csel opcodes
These lower naturally to the corresponding fcsel opcodes.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20162>
2023-01-12 23:01:05 +00:00
Alyssa Rosenzweig
2976548e4a nir/gather_info: Handle store_zs_agx
This acts as a depth/stencil write. The AGX compiler checks outputs_written to
determine what conservative depth settings the driver needs. Nominally, this
should work: the original store_output(FRAG_RESULT_DEPTH) intrinsic causes the
DEPTH outputs_written bit to be set, so the metadata is still correct after
lowering store_output to store_zs_agx. However, there are a handful of places
that call nir_gather_info late, which *resets* the existing outputs_written
value and regathers, causing Asahi to use the wrong conservative depth settings
when shuffling NIR pass order and breaking gl_FragDepth.

To fix, handle store_zs_agx conservatively when gathering info so we don't have
to play games with the pass order or stashing info in a sideband.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20563>
2023-01-11 21:14:20 +00:00
Alyssa Rosenzweig
cc5ca8164d nir: Add store_agx intrinsic
This works like store_global, but lets us optimize address arithmetic. Like
load_agx, it is formatted to match the hardware semantic. We don't make use of
any clever formats in this series, though.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20558>
2023-01-11 20:36:51 +00:00
Jesse Natalie
b0f3a387c9 nir_lower_fragcoord_wtrans: Support Vulkan shaders
In Vulkan shaders, you might not have all derefs pointing to a variable

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20400>
2023-01-10 04:25:26 +00:00
Pavel Ondračka
a93bc6afc4 nir: check for x - ffract(x) patterns when lowering f2i32
We already skip emitting ftrunc in nir_lower_int_to_float when there is
ffloor, fround or any other integer-making opcode preceding f2i32. However
if lower_ffloor is set for driver that doesn't support integers, the lowered
x - ffract(x) patterns would not be recognized and extra ftruct would be
emitted, doing unnecessary rounding.

This optimization only works if there is no non-trivial swizzling used for
the fadd, fneg and ffract involved, which seems to be 99% of the cases according
to my testing.

This is needed to enable nir ffloor lowering on r300 driver without regressions.

I'm not sure if this helps anybody else, the only hardware which sets
lower_ffloor and converts ints to floats (and can't do trunc) are some old
etnaviv cards, so maybe it will help there a bit.

Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20208>
2023-01-05 12:01:32 +00:00
Qiang Yu
cf2ea3fce9 nir/xfb: save high_16bits output info
It is combined with slot location to identify a varying
when using VARYING_SLOT_VARx_16BIT.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20157>
2023-01-05 01:12:06 +00:00
Ian Romanick
db241fbd70 nir: Don't allow conflicting bitfield lowering passes
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20323>
2023-01-03 18:37:53 -08:00
Pavel Ondračka
53d9b696e4 nir: basic tests for nir_opt_shrink_vectors
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20213>
2023-01-03 12:32:33 +01:00
Pavel Ondračka
3305c9602d nir: fix shrinking of load_const for large vectors
Specifically when shrinking load_const with number of components
> 5, if the final number of components is not allowed (for example 8->6)
it would report false for progress even if we actually did some
reshuffling and also it would skip on the rewrite of the readers.

Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20213>
2023-01-03 12:32:33 +01:00
Pavel Ondračka
cb7f201288 nir: remove duplicate alu channels in nir_opt_shrink_vectors
This will clean code like:
   vec3 32 ssa_8 = frcp ssa_7.www
   vec3 32 ssa_9 = fmul ssa_7.xyz, ssa_8
into
   vec1 32 ssa_8 = frcp ssa_7.w
   vec3 32 ssa_9 = fmul ssa_7.xyz, ssa_8.xxx

This helps r300 driver because we can only do single channel for math
ops at a time, so the first version would result in three frcp
instructions. The nir_opt_shrink_vectors comments even claim the pass
should be doing this, however it actually does it only for nir_op_vecx
instructions, so extend this for generic alu instructions.

RV530 shader-db:
total instructions in shared programs: 135032 -> 133707 (-0.98%)
instructions in affected programs: 46121 -> 44796 (-2.87%)
helped: 452
HURT: 26
total temps in shared programs: 17051 -> 17033 (-0.11%)
temps in affected programs: 1509 -> 1491 (-1.19%)
helped: 91
HURT: 30

12.02->12.08 (+0.5%) fps gain in Unigine Sanctuary (n=5) with RV530

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7051
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reiewed-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20213>
2023-01-03 12:32:33 +01:00
Danylo Piliaiev
1c9ee30838 nir/fold_16bit_tex_image: Add type granularity for dst folding
Some HW may be able to fold only some of dst types, e.g.
for Adreno folding i32 -> i16 could cause a different result since
folded variant clamps the result instead of masking it.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20396>
2022-12-23 15:48:18 +01:00
Lionel Landwerlin
3af08b9c30 nir/divergence: handle shader_record_ptr intrinsic
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes 6b8fd65e84 ("spirv: Implement the new ray-tracing storage classes")

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20413>
2022-12-23 09:22:13 +00:00
Danylo Piliaiev
8482ad0110 nir/nir_lower_is_helper_invocation: Lower helper invocation if required
nir_lower_is_helper_invocation lowers intrinsic_is_helper_invocation
and uses load_helper_invocation (which is lowered by nir_lower_system_values).
While nir_lower_system_values may lower SYSTEM_VALUE_HELPER_INVOCATION
into intrinsic_is_helper_invocation.

So they depend on each other. Break the dependency by making
nir_lower_is_helper_invocation aware of lower_helper_invocation option
and emitting lowered load_helper_invocation when required.

Happens with SPIR-V 1.6 for which gl_HelperInvocation is translated into
"BuiltIn HelperInvocation" + "Volatile", which nir_lower_system_values
translates into is_helper_invocation.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19677>
2022-12-20 11:06:52 +00:00
Qiang Yu
e85c5d8779 nir/divergence_analysis: add missing intrinsics
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Singed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18666>
2022-12-19 09:22:24 +08:00
Qiang Yu
194add2c23 nir: lower image add lower_to_fragment_mask_load_amd option
Like lower_to_fragment_fetch_amd option in lower tex,
this is for radeonsi to lower MS image ops.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18666>
2022-12-19 09:22:16 +08:00
Qiang Yu
1461b5f61b nir: add image fragment mask load intrinsic
Like nir_texop_fragment_mask_fetch_amd, this is used to load multi
sample image fmask data for AMD GPU.

We will lower multi sample image load and samples_identical intrinsics
to use it latter for radeonsi. RADV does not need this because it
always expand fmask images before dispatch compute shader.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18666>
2022-12-19 09:22:11 +08:00
Alyssa Rosenzweig
71e2028ce3 nir: Add store_zs_agx intrinsic
Will be used for frag depth/stencil export with multisampling.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20365>
2022-12-17 18:10:28 +00:00
Karol Herbst
6d6c6caff1 nir_lower_io_to_scalar: handle load/store_global
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20106>
2022-12-16 08:02:32 +00:00
Karol Herbst
3cd641bebd nir_lower_io_to_scalar: make use of nir_get_io_offset_src
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20106>
2022-12-16 08:02:32 +00:00
Kenneth Graunke
0521027182 nir: Allow more than just ALU instructions in 'weak' GVN
This removes the ALU-only restriction on the "weak" GVN introduced by
the previous commit.  This makes it slightly more aggressive, allowing
it to coalesce things like UBO loads (still within sister then/else
blocks).  This also can have surprisingly large cascading effects.

I was concerned that this might increase register pressure, but
shader-db and fossil-db show effectively no change in spills/fills,
so it seems to be fine.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19823>
2022-12-14 20:56:55 +00:00
Kenneth Graunke
d5d03a7273 nir: Perform 'weak' global value numbering in all GCM passes
Full global value numbering (GVN) can be pretty aggressive, moving
values far away from their original locations, even out of loops,
and can extend their live ranges a lot.  So we've left it disabled.

This patch introduces a weaker form of GVN: we only allow coalescing
identical values when they appear on either side of the same if/else
construct.  For now, we also only allow ALU instructions.

This allows nir_opt_gcm to clean up identical instructions appearing
on both sides of if/then/else control flow.  But it avoids aggressively
combining every other occurrence of a value in the program.

This can still have surprisingly large cascading effects, as simple
constructs are cleaned up, leading to more opportunities to do the
same clean up, up a chain of nested ifs.  It also enables greater use
of the select peephole as ifs are cleaned up.

shader-db and fossil-db results show a reduction in spills/fills on
Icelake, so it doesn't seem to be hurting register pressure.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19823>
2022-12-14 20:56:55 +00:00
Ian Romanick
eb76cee9f8 nir: Eliminate nir_op_i2b
There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b.  Instead of adding a huge
pile of additional patterns (including variations that include both ine
and i2b), always lower i2b to a != 0.

At this point in the series, it should be impossible for anything to
generate i2b, so there /should not/ be any changes.

The failing test on d3d12 is a pre-existing bug that is triggered by
this change.  I talked to Jesse about it, and, after some analysis, he
suggested just adding it to the list of known failures.

v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b.

v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py.

v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after
nir_lower_doubles makes progress.  The latter can generate b2i
instructions, but nir_lower_int64 can't handle them (anymore).

v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I
had accidentally removed the f2b(bf2(x)) optimization.

v6: Just eliminate the i2b instruction.

v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused)
emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this
function was still used. 🤷

No shader-db changes on any Intel platform.

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141165875 -> 141165873 (-0.0%)
Instructions helped: 2

Cycles in all programs: 9098956382 -> 9098956350 (-0.0%)
Cycles helped: 2

The two Vulkan shaders are helped because of the "new" (('b2i32',
('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern.

Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version]
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
8b37046765 nir/builder: Handle i2b conversions specially in nir_type_convert
The shaders affected here are ones that were previously affected when
i2b was unconditionally lowered in opt_algebraic. There are a few places
where some transformations happen in a different order, so some
algebraic patterns are missed.

All Broadwell and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19914369 -> 19914566 (<.01%)
instructions in affected programs: 92375 -> 92572 (0.21%)
helped: 0 / HURT: 90

total cycles in shared programs: 853851470 -> 853867215 (<.01%)
cycles in affected programs: 12400663 -> 12416408 (0.13%)
helped: 28 / HURT: 69

Haswell and Ivy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 16710721 -> 16710700 (<.01%)
instructions in affected programs: 108010 -> 107989 (-0.02%)
helped: 57 / HURT: 103

total cycles in shared programs: 884299412 -> 884306546 (<.01%)
cycles in affected programs: 12986423 -> 12993557 (0.05%)
helped: 87 / HURT: 102

total spills in shared programs: 14937 -> 14925 (-0.08%)
spills in affected programs: 12 -> 0
helped: 9 / HURT: 0

total fills in shared programs: 17569 -> 17557 (-0.07%)
fills in affected programs: 12 -> 0
helped: 9 / HURT: 0

Sandy Bridge
total instructions in shared programs: 13902341 -> 13902347 (<.01%)
instructions in affected programs: 7311 -> 7317 (0.08%)
helped: 3 / HURT: 8

total cycles in shared programs: 741795500 -> 741792266 (<.01%)
cycles in affected programs: 273308 -> 270074 (-1.18%)
helped: 9 / HURT: 2

No shader-db changes on any other Intel platform.

No fossil-db changes on any Intel platform.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
ded3572947 nir: Use nir_type_convert instead of nir_type_conversion_op
In a future commit, nit_type_conversion_op won't be able to handle i2b
(and in a much later commit f2b), so switch many users to the fully
featured function.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
9f86d18b2d nir/builder: Add rounding mode parameter to nir_type_convert
Later changes will use this.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
9342c14eeb nir/builder: Emit x != 0 for nir_i2b
There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b.  Instead of adding a huge
pile of additional patterns (including variation that include both ine
and i2b), just emit a != 0 instead of i2b(a).

I think that the changes to the unit tests weaken them slightly, but
perhaps that's okay?

No shader-db changes on any Intel platform.  The GLSL paths use other
means to generate i2b operations, but the SPIR-V paths use nir_i2b.
Presumably since 4676b3d3dd (nir: Use nir_test_mask instead of
i2b(iand)), no fossil-db changes either.

v2: Use nir_ine_imm.  Suggested by Jesse.

Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
7a5e9df39d nir: Use nir_i2b wrapper everywhere instead of using nir_i2b1 directly
No shader-db or fossil-db changes on any Intel platform.

v2: Add missed i2b1 in ir3_nir_opt_preamble.c.

v3: Add missed i2b1 in ac_nir_lower_ngg.c.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
b60b2f2add nir/algebraic: Optimize some b2i involved in masking operations
v2: Remove the ineg from the b2i in the ior pattern.  Suggested by
Jason.

All Ivy Bridge and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19914441 -> 19914369 (<.01%)
instructions in affected programs: 63507 -> 63435 (-0.11%)
helped: 24 / HURT: 0

total cycles in shared programs: 853869766 -> 853851470 (<.01%)
cycles in affected programs: 10551542 -> 10533246 (-0.17%)
helped: 24 / HURT: 0

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141163061 -> 141092683 (-0.0%)
Instructions helped: 14103
Instructions hurt: 55

Cycles in all programs: 9132376195 -> 9133183045 (+0.0%)
Cycles helped: 13775
Cycles hurt: 380

Spills in all programs: 18286 -> 18284 (-0.0%)
Spills helped: 1

Fills in all programs: 30647 -> 30643 (-0.0%)
Fills helped: 1

Gained: 133
Lost: 130

Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00