The previous pass shrinking values stored on the stack might have left
some gaps on the stack (a vec4 turned into a vec3 for instance).
This pass reorders variables on the stack, by component bit size and
by ssa value number. The component size is useful to pack smaller
values together. The ssa value number is also important because if we
have 2 calls spilling the same values, then we can avoid reemiting the
spillings if the values are stored in the same location.
v2: Remove unused sorting function (Konstantin)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
For example, if we store to scratch a vec4 but only a subset of
components are used after the load operation.
v2: Use nir_intrinsic_write_mask (Konstantin)
Use u_foreach_bit() instead of u_bit_scan() (Konstantin)
Fix mask building loop (Konstantin)
v3: Fix reswizzle (Konstantin)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
Previously when considering whether to rematerialize or spill/fill
ssa_1954, we would go for a spill/fill :
vec4 32 ssa_388 = (float32)txf ssa_387 (texture_handle), ssa_86 (coord), ssa_23 (lod), 0 (texture), 0 (sampler)
...
vec1 32 ssa_1953 = load_const (0xbd23d70a = -0.040000)
vec1 32 ssa_1954 = fadd ssa_388.x, ssa_1953
vec1 32 ssa_1955 = fneg ssa_1954
This is because when looking at ssa_1955 the first time, we would
consider ssa_388 unrematerialiable, and therefore all values built on
top of it would be considered unrematerialiable as well.
The missing piece when considering whether to rematerialize ssa_1954
is that we should look at filled values. Now that ssa_388 has been
spilled/filled, we can rebuild ssa_1955 on top of the filled value and
avoid spilling/filling ssa_1955 at all.
This requires a bit more work though. We can't just look at an
instruction in isolation, we need to go through the ssa chains until
we find values we can rematerialize or not.
In this change we build a list of all ssa values involved in building
a given value, up to the point there we find a filled or a
rematerializable value.
In this particular case, looking at ssa_1955 :
* We can rematerialize ssa_388 from its filled value
* We can rematerialize ssa_1953 trivially
* We can rematerialize ssa_1954 because its 2 inputs are rematerializable
* We can rematerialize ssa_1955 because ssa_1954 is rematerializable
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
Currently we do something like this :
ssa_0 = ...
ssa_1 = ...
* spill ssa_0, ssa_1
call1()
* fill ssa_0, ssa_1
ssa_2 = ...
ssa_3 = ...
* spill ssa_0, ssa_1, ssa_2, ssa_3
call2()
* fill ssa_0, ssa_1, ssa_2, ssa_3
If we assign the same possition to ssa_0 & ssa_1 in the spilling
stack, then on call2(), we know that those values are already present
in memory at the right location and we can avoid respilling them.
The result would be something like this :
ssa_0 = ...
ssa_1 = ...
* spill ssa_0, ssa_1
call1()
* fill ssa_0, ssa_1
ssa_2 = ...
ssa_3 = ...
* spill ssa_2, ssa_3
call2()
* fill ssa_0, ssa_1, ssa_2, ssa_3
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
For a follow up optimization, we would like to track scratch loads.
This isn't possible with global load/store intrinsics. So use a couple
of special intrinsic in the pass and only lower it to global
intrinsics at the end.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
For used by different counter.
Vulkan:
1. VK_QUERY_PIPELINE_STATISTIC_GEOMETRY_SHADER_PRIMITIVES_BIT,
sum generated primitives of all 4 streams when GS.
2. VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT, count generated
primitives for all 4 streams when VS/TES/GS.
3. VK_QUERY_TYPE_TRANSFORM_FEEDBACK_STREAM_EXT, count generated
and streamout primitives for all 4 streams when VS/TES/GS.
OpenGL:
1. GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED_ARB, sum generated
primitives for all 4 streams when GS.
2. GL_PRIMITIVES_GENERATED, count generated primitives for all 4
streams when VS/TES/GS.
3. GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN, count streamout
primitives for all 4 streams when VS/TES/GS.
pipeline_stat_query_enabled_amd is for Vulkan 1 and OpenGL 1.
xfb_query_enabled_amd is for Vulkan 2/3 and OpenGL 2/3.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19015>
Based on si_create_passthrough_tcs() as that seemed the most generic of
the various different backend driver implementations. Uses the
load_tess_level_outer_default and load_tess_level_inner_default
intrinsics to load the gl_TessLevelOuter and gl_TessLevelInner values,
so driver will somehow need to implement those to load the values set
by pipe_context::set_tess_state() or similar.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19259>
The current uniform query uploader for mat3 calcs things as
if the vector elements are f16vec4 wide, so fix the calcs
here to do the same.
Fixes GTF-GL46.gtf21.GL.mat3.mat3arraysimple_frag on llvmpipe
when 16-bit uniform lowering is allowed.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14817>
nir_opt_preamble will be crucial to optimize out the lowering for array
textures on AGX, which involves this AGX-specific sysval.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18813>
For example if src0 is 0x80000000 we should return 1, not 0.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: a5747f8ab3 ("nir: add opcodes for *find_msb_rev and lowering")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18951>
Also modify all existing uses to pass a zero to this new src.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> (nir)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17551>
Previously, we always treated these as coherent, but now let's make
this configurable. Also set all current users to ACCESS_COHERENT.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> (nir)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17551>
These intrinsics will be used to lower NGG attributes to memory on
GFX11.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19173>
For radeonsi which load this from arg.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19166>
...even for non-pixel interpolation, for consistency. Otherwise backends get
funny intrinsics with interpolateAt:
vec4 32 ssa_4 = intrinsic load_interpolated_input (ssa_3, ssa_2) (base=1, component=0, dest_type=invalid /*0*/, io location=33 slots=1 /*161*/)
We know it'll be a float, but backends shouldn't need to special case this. (Or
maybe interpolated_input shouldn't have a dest_type index. I'd be ok with that
resolution too. But having one and not setting it consistently is wrong.)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19085>
This hack in glsl_to_nir() to clean up after the glsl ir linker should
no longer be reachable. These type of linking opts are now done via
a nir based linker long after GLSL IR has been coverted to nir by
this pass.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19104>
I stumbled upon this old NIR pass (still in use by intel and broadcom)
and noticed how most of the code was NIR boilerplate that we have
helpers for. Rewrite the pass to use all the helpers.
v2: Fix cube map arrays.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18754>
This failed to take fabs of the first component, implementing an unintended
formula that would return the right results in some common cases but is wrong in
general:
max { x, |y|, |z| }
instead of the intended
max { |x|, |y|, |z| }
Reexpress the implementation to make correctness obvious.
Fixes: 272e927d0e ("nir/spirv: initial handling of OpenCL.std extension opcodes")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18754>
If there is a single use of fmul, and that single use is fadd, it makes
sense to fuse ffma, as we already do. However, if there are multiple
uses, fusing may impede code gen. Consider the source fragment:
a = fmul(x, y)
b = fadd(a, z)
c = fmin(a, t)
d = fmax(b, c)
The fmul has two uses. The current ffma fusing is greedy and will
produce the following "optimized" code.
a = fmul(x, y)
b = ffma(x, y, z)
c = fmin(a, t)
d = fmax(b, c)
Actually, this code is worse! Instead of 1 fmul + 1 fadd, we now have 1
fmul + 1 ffma. In effect, two multiplies (and a fused add) instead of
one multiply and an add. Depending on the ISA, that could impede
scheduling or increase code size. It can also increase register
pressure, extending the live range.
It's tempting to gate on is_used_once, but that would hurt in cases
where we really do fuse everything, e.g.:
a = fmul(x, y)
b = fadd(a, z)
c = fadd(a, t)
For ISAs that fuse ffma, we expect that 2 ffma is faster than 1 fmul + 2
fadd. So what we really want is to fuse ffma iff the fmul will get
deleted. That occurs iff all uses of the fmul are fadd and will
themselves get fused to ffma, leaving fmul to get dead code eliminated.
That's easy to implement with a new NIR search helper, checking that all
uses are fadd.
shader-db results on Mali-G57 [open shader-db + subset of closed]:
total instructions in shared programs: 179491 -> 178991 (-0.28%)
instructions in affected programs: 36862 -> 36362 (-1.36%)
helped: 190
HURT: 27
total cycles in shared programs: 10573.20 -> 10571.75 (-0.01%)
cycles in affected programs: 72.02 -> 70.56 (-2.02%)
helped: 28
HURT: 1
total fma in shared programs: 1590.47 -> 1582.61 (-0.49%)
fma in affected programs: 319.95 -> 312.09 (-2.46%)
helped: 194
HURT: 1
total cvt in shared programs: 812.98 -> 813.03 (<.01%)
cvt in affected programs: 118.53 -> 118.58 (0.04%)
helped: 65
HURT: 81
total quadwords in shared programs: 98968 -> 98840 (-0.13%)
quadwords in affected programs: 2960 -> 2832 (-4.32%)
helped: 20
HURT: 4
total threads in shared programs: 4693 -> 4697 (0.09%)
threads in affected programs: 4 -> 8 (100.00%)
helped: 4
HURT: 0
v2: Update trace checksums for virgl due to numerical differences.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18814>
validate_ssa_def_dominance() asserts :
validate_assert(state, !BITSET_TEST(state->ssa_defs_found, def->index));
Because the previous validation lefts bits set when it processed the
IR.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18966>
In that case we need to use the sysval. That sysval can be optimized anyway in
the nonvariable case. Fixes test_basic.get_linear_ids on panfrost.
Fixes: 998d84fca5 ("nir/lower_system_values: Support lowering more intrinsics")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18662>
The new enum is called nir_selection_control_divergent_always_taken,
and it's almost the same as nir_selection_control_flatten.
The main difference between the two is that "flatten" represents
a choice made by the application but "divergent_always_taken" may
be applied by the compiler stack when it thinks this is beneficial.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-By: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>
If the abs source modifiers is not supported for the current
instruction because it is an instruction with three sources we
may still see a parent mov that has the `abs` modifier. In this
case we must not propagate that abs modifier from that parent
instructions.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7350
Fixes: cd73b6174b
nir/lower_to_source_mods: Stop turning add, sat, and neg into mov
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18902>
GLSL's lower_vector_derefs already does this, and even if it didn't
nir_vector_extract() would when glsl-to-nir happens.
No effect on freedreno shader-db.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18466>
nir_lower_vars_to_ssa will split temp structs up anyway. This fixes a bug
where mediump wouldn't be propagated to the split vars.
The effect is tiny, I think just shuffling some code scheduling from
optimizing at different places. Affects Natural Selection 2, Serious Sam
3, 3dmark slingshot, and Lego Legacy.
freedreno shader-db:
total instructions in shared programs: 11315637 -> 11315993 (<.01%)
instructions in affected programs: 24861 -> 25217 (1.43%)
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18466>
nir_lower_vars_to_ssa will split temp arrays up anyway. Fixes a bug where
split arrays wouldn't get their precision qualifier.
Helps mostly Android and skia shaders. Also affects Civ5, Witcher 2, and
Borderlands 2.
freedreno shader-db:
total instructions in shared programs: 11319395 -> 11319355 (<.01%)
instructions in affected programs: 65744 -> 65704 (-0.06%)
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18466>