The issue is that the current narrowing patterns are not working in a lot
of cases, for example
(('fdot3', ('vec3', a, 0.0, 0.0), b), ('fmul', a, b)),
is missing patterns like this:
32x3 %1 = load_const (0x3f800000, 0x00000000, 0x00000000) = (1.000000, 0.000000, 0.000000)
32x4 %7 = vec4 %6, %2 (0x0), %2 (0x0), %2 (0x0)
32 %19 = fdot3 %1 (1.000000, 0.000000, 0.000000), %7.xyz
or after some later transforms:
32x2 %0 = load_const (0x3f800000, 0x00000000) = (1.000000, 0.000000)
32x2 %6 = vec2 %5, %1 (0x0)
32 %18 = fdot3 %0 (1.000000, 0.000000).xyy, %6.xyy
This patch is heavily based on old branch from Ian Romanick from 2019.
r300 RV530 shader-db:
total instructions in shared programs: 128900 -> 128882 (-0.01%)
instructions in affected programs: 621 -> 603 (-2.90%)
helped: 10
HURT: 1
total cycles in shared programs: 191837 -> 191828 (<.01%)
cycles in affected programs: 799 -> 790 (-1.13%)
helped: 7
HURT: 1
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39068>
Introduce a new NIR pass called nir_remove_outputs which works on
lowered I/O intrinsics and can remove any output varying or sysval.
This is meant to replace custom solutions in drivers,
such as radv_remove_varyings and similar.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33928>
Only allow holes between the first and last used component.
Do not load unused components before the first used component.
This fixes test failures with a bunch of VK CTS tests
with allow_holes enabled on RADV:
dEQP-VK.tessellation.tess_io.max_in_out.with_f16.*
Fixes: 6286c1c66f
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33979>
We don't need the shader_info fields anymore. sample and centroid fields
are unused. The interp field is already available from
si_shader_info::color_interpolate.
The loads don't need to be sysvals. Add also the _amd suffix.
Don't handle it in st_nir_lower_drawpixels either because the intrinsics
are created much later in compilation now.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38802>
Applying the projector after the normalization breaks the coordinates, so
apply it early. Usually it's not even necessary for the cubemaps anyway,
but ARB_fragment_program and TGSI allow it.
Fixes: 52e71809 ("nir: Add a cubemap normalizing pass")
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39087>
We don't need one bit per bitsize per instruction if only one actually
matters in the end.
First step towards moving NIR in the direction of full float_controls2
only.
Also rename this from fp_fast_math, because that name implied that 0 is
the no fast math mode, while the opposite was the case.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39026>
Those helpers can be called late (since it's mostly for debug
purposes). This can avoid surprises in the backend and also avoids
rerunning gather_info.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38995>
Mix of Coccinelle patch, manual fix ups, sed, etc. Probably best to review the diff
as-if hand written:
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38955>
This will also fix NIR_DEBUG=extended_validation complaining about
invalid loop analysis. GCM will invalidate loop analysis if progress
was made, and depending on the removed instruction it will affect the
instr_cost.
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38932>
We don't bother with maximums or wrapping because it shouldn't come up
for IO intrinsics anyway.
fossil-db results on Battlemage:
Instrs: 231363032 -> 231359554 (-0.00%)
Cycle count: 34057005552.0 -> 34057236190.0 (+0.00%); split: -0.00%, +0.00%
Max live registers: 71873886 -> 71870438 (-0.00%)
Non SSA regs after NIR: 67159408 -> 67159523 (+0.00%)
Totals from 1779 (0.23% of 788851) affected shaders:
Instrs: 774359 -> 770881 (-0.45%)
Cycle count: 10551280.0 -> 10781918.0 (+2.19%); split: -0.32%, +2.51%
Max live registers: 158193 -> 154745 (-2.18%)
Non SSA regs after NIR: 180104 -> 180219 (+0.06%)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
When considering ((x << y) % divisor), we recursed to calculate
mod = (x % (divisor << y)) but incorrectly returned mod directly,
rather than the correct value, (mod << y).
(Note that we require divisor to be a power-of-two.)
As an example of this going wrong, (x << 1) % 4 was returning (x % 2)
which is 0 or 1, but x << 1 is 2x, which is always an even number so
the result mod 4 can only be 0 or 2.
Unit test suggested by Caio Oliveira during review.
Fixes: 2255375c4d ("nir: add nir_mod_analysis & its tests")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
This matches my current understanding of nir_opt_copy_prop, including that
nir_opt_copy_prop always replaces movs with vecN.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>
Due to rebasing not recognizing it as a conflict, it ended up having
the same value as nir_io_assign_color_input_bases_after_all_other_inputs.
Fixes: 9a2f1be814 - nir: add FRAG_RESULT_DUAL_SRC_BLEND and an option to use it
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>
GLSL defines gl_SampleMaskIn as :
"a fragment language that indicates the set of samples covered
by the primitive generating the fragment during multisample
rasterization"
when variable rate shading is enabled, a single invocation might cover
multiple samples. The lowering done in nir_lower_single_sampled() does
not account for that case, so add an option to selectively disable it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38641>
On the assumption that nobody will use a sample id greater than the sample
count, have loop unrolling guess based on the driver's max sample count.
This unrolls a simple resolve shader with a uniform max samples on ir3 to:
value = vec4(0);
if (max_samples > 0) {
value += txf_ms(coord, 0);
if (max_samples > 1 {
value += txf_ms(coord, 1);
if (max_samples > 2){
value += txf_ms(coord, 2);
if (max_samples > 3) {
value += txf_ms(coord, 3);
for (i = 4; i < max_samples; i++)
value += txf_ms(coord, i);
}
}
}
}
...
This is only worth a 1% win on our microbenchmark as-is, but if we could
flatten those ifs out and pull the fadds out to the end, avoiding syncs
per load would be a big win. This seems like a first step.
I've taken a shot at updating drivers to set the value, and tried to leave
notes in places that drivers might update, and want to follow up with
updating the compiler option.
This affects over half the DX11 apps in shader-db-private.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38585>
The previous implementation seems to predate nir_instr_clone() and
duplicates a lot of the deref cloning code. This also makes the pass
preserve deref->arr.in_bounds correctly.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38856>
This is potentially nicer for some drivers. AMD drivers will use it.
mesa_frag_result_get_color_index will be used often.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38604>
nir_lower_clip_cull_distance_array_vars was sneakily updating
shader_info::clip/cull_distance_array_size.
This moves the gathering into a new function
nir_gather_clip_cull_distance_sizes_from_vars.
v2: remove assertions that prevented nir_lower_clip_cull_distance_array_vars
from being used with non-compact arrays
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38465>
I was wondering where this was disappearing to.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38789>
Since Metal doesn't pass clip distance into the fragment shader, we have to
do it ourselves. The CLIP_DIST0/1 varying slots are used to represent the
user-defined varyings we use to pass them from vertex to fragment and
a new intrinsic is added to represent the write to the built-in
clip_distance variable. Since the CLIP_DIST0/1 varying slots are not affected
by opt_varyings, there can be potential interface mismatches so the machinery
in msl_iomap.c is refactored to allow them to be output as a series of scalars
rather than vectors.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38839>
We used load_frag_coord_unscaled_ir3 for loading the fragment coord for
input attachments in GMEM, where the normal scaling for gl_FragCoord
shouldn't be used. However with custom resolve a different scaling will
apply to attachments in GMEM. Separate "unscaled" from "gmem" and rename
the NIR options, in preparation for this.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38451>