Soon we'll be allocating instructions out of a per-shader pool, which
means that if we don't free too many instructions during the main
optimization loop, the final nir_sweep() call will create holes which
can't be filled. By freeing instructions more aggressively, we can
allocate more instructions from the freelist which will reduce the final
memory usage.
Modified from Connor Abbott's original patch to rebase on top of
refactored DCE and so that the use-after-free in nir_algebraic_impl() is
fixed.
Co-authored-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12910>
Used by RADV/Radeonsi NGG culling. Pack them into a single vec4
load for radeonsi to reduce const buffer load.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
Used by AMD GPU NGG line culling. We could use nir load
line width and viewport scale to calculate this in shader,
but this way needs expensive divide ops.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
This fixes the use of an uninitialzed value:
Conditional jump or move depends on uninitialised value(s)
bcmp (vg_replace_strmem.c:1203)
_mesa_add_sized_state_reference (prog_parameter.c:434)
st_nir_assign_uniform_locations(gl_context*, gl_program*, nir_shader*) (st_glsl_to_nir.cpp:209)
st_finalize_nir (st_glsl_to_nir.cpp:1041)
by 0x58271B9: st_glsl_to_nir_post_opts(st_context*, gl_program*, gl_shader_program*) (st_glsl_to_nir.cpp:571)
...
Uninitialised value was created by a heap allocation
malloc (vg_replace_malloc.c:381)
ralloc_size (ralloc.c:114)
ralloc_array_size (ralloc.c:218)
deref_offset_var (nir_lower_atomics_to_ssbo.c:47)
lower_instr (nir_lower_atomics_to_ssbo.c:111)
nir_lower_atomics_to_ssbo (nir_lower_atomics_to_ssbo.c:204)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18227>
xfb_decl_assign_location() assumes that arrays are going to be packed.
But some conditions might prevent packing (e.g: explicit location or
smooth interpolation mode).
Instead of assuming that packing will happen, this commit adds a check to
determine if it'll happen and use the result to compute the proper location.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2214
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18175>
Warning messages:
../src/compiler/glsl/tests/list_iterators.cpp:68:1: warning: 'InstantiateTestCase_P_IsDeprecated' is deprecated: INSTANTIATE_TEST_CASE_P is deprecated, please use INSTANTIATE_TEST_SUITE_P [-Wdeprecated-declarations]
../src/compiler/glsl/tests/list_iterators.cpp:187:1: warning: 'InstantiateTestCase_P_IsDeprecated' is deprecated: INSTANTIATE_TEST_CASE_P is deprecated, please use INSTANTIATE_TEST_SUITE_P [-Wdeprecated-declarations]
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18203>
Warning messages:
../src/compiler/nir/tests/serialize_tests.cpp:113:1: warning: 'InstantiateTestCase_P_IsDeprecated' is deprecated: INSTANTIATE_TEST_CASE_P is deprecated, please use INSTANTIATE_TEST_SUITE_P [-Wdeprecated-declarations]
../src/compiler/nir/tests/serialize_tests.cpp:119:1: warning: 'InstantiateTestCase_P_IsDeprecated' is deprecated: INSTANTIATE_TEST_CASE_P is deprecated, please use INSTANTIATE_TEST_SUITE_P [-Wdeprecated-declarations]
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18203>
These tests fail on Windows, because we open the expected files in
text-mode, performing EOL conversion. Instead, let's read them as binary
files, and manually UTF-8 decode them to get the expected result.
This fixes the tests on Windows for me.
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18179>
This intrinsic returns a Boolean. Both 1-bit and 32-bit versions must
be allowed. Otherwise, size mismatches will occur after lowering
1-bit Booleans to 32-bit.
Fixes: 4cbdf9ec4d ("nir,spirv: implement SpvOpImageSparseTexelsResident")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16547>
textureGatherOffsets always takes a highp array of constants. As
per the discussion in [1] trying to lower the precision results in segfault
later on in the compiler as textureGatherOffsets will end up being passed
a temp when its expecting a constant as required by the spec.
[1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16547#note_1393704
Fixes: b83f4b9fa2 ("glsl: Add an IR lowering pass to convert mediump operations to 16-bit")
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18101>
this is cumbersome to detect, so detect it here
the flag denotes the use of either bindless texture operations
or shader variables such that drivers can infer the use of bindless
descriptor management functionality
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18088>
When primitive is points, EndPrimitive can't be used to count
primitive. Need to use vertex count instead. And it's also not
needed to do vertex per primitive count and overwrite incomplete
primitive work for points.
Fixes: 2be99012e9 ("nir: Add ability to count emitted GS primitives.")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17805>
this adds minimal validation for tex ops with derefs to check that the
dest type integer-ness matches the sampled type's integer-ness
the aim is to provide the most basic validation that nir is being modified
and created consistently, not to perform exact verification that
the types are identical
fix#6985
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17874>
This reverts commit 96fa23bca5.
The correct fix to the problem was a1bc152340, making this
change obsolete as the pass skips any vars marked with
always_active_io. There was no real advantage to allowing these
vars to be split because they can't be removed anyway. Also there
is no way to split varying arrays gracefully here due to the xfb
layout rules, and this change didn't handle arrays at all.
Removing this obsolete code also fixes an assert in the new CTS
test KHR-Single-GL45.enhanced_layouts.xfb_all_stages. The test
was legally adding xfb offsets to all vertex stages but since
we only mark the varyings in the final vertex stage with the
always_active_io flag the other stages were correctly lowering
to scalars but when an array with an offset hit this code it
asserted since it couldn't handle it.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Fixes: a1bc152340 ("spirv: mark variables decorated with XfbBuffer as always active")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6928
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17878>
uadd_carry returns 1 or 0, so ANDing with 1 is unnecessary. Probably
this was implemented thinking that it was returning a boolean value.
shader-db results for V3D:
total instructions in shared programs: 12463571 -> 12462964 (<.01%)
instructions in affected programs: 28994 -> 28387 (-2.09%)
helped: 110
HURT: 1
total uniforms in shared programs: 3704591 -> 3704588 (<.01%)
uniforms in affected programs: 247 -> 244 (-1.21%)
helped: 3
HURT: 0
total max-temps in shared programs: 2148138 -> 2148117 (<.01%)
max-temps in affected programs: 729 -> 708 (-2.88%)
helped: 23
HURT: 2
total sfu-stalls in shared programs: 21230 -> 21232 (<.01%)
sfu-stalls in affected programs: 0 -> 2
helped: 0
HURT: 2
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17903>
For OpenCL kernels we simply link together SPIR-V files, so the only case
where we are left with linking shaders together is libclc and we handle
that just fine.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17334>
Clang unconditionally adds those definitions if using a spirv LLVM target.
That's not a problem on its own, but clang's internal OpenCL header enable
a bunch of OpenCL extensions if those are set.
Lucky for us, we can simply undefine them and spare us the trouble of
finding an upstream solution to this problem :)
This fixes the OpenCL CTS' compiler features_macro test.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17334>
It now removes dead inline sampler variables and moves everything to the
end so we no longer need nir_move_inline_samplers_to_end().
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17334>
Also make the code cleaner and simplier.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17334>
AMD will use this to execute a lowering pass conditionally.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17693>
I'm sorry to whoever wrote this, but
(x - (int) (x < 0)) ^ -((int) (x < 0))
is not an acceptable way to write iabs.
Shader-db results on Intel Tiger Lake with lower_idiv enabled:
total instructions in shared programs: 21122548 -> 21122570 (<.01%)
instructions in affected programs: 2369 -> 2391 (0.93%)
helped: 2
HURT: 8
total cycles in shared programs: 791609360 -> 791608062 (<.01%)
cycles in affected programs: 114106 -> 112808 (-1.14%)
helped: 9
HURT: 1
If we make the Intel back-end less stupid, we get to 9/1 helped/HURT for
instructions as well but that's for a different MR.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17845>
Is a phi source is an undef, there's no point in copying it or really
caring about it at all. We would just end up inserting a mov from an
undef to a register. Instead, treat phi sources which point to an undef
as if the phi source doesn't exist.
This also prevents them from being included in phi webs which should
reduce the overall interference seen in the shader. Currently, if two
phis share an undef, their phi webs are consdiered to interfere. By
ignoring undefs we can get rid of this false interference and reduce the
size of phi webs. Reducing the number of things being copied by the
parallel copy instructions should also free up the paralle copy
algorithm and reduce the over-all churn of movs.
Shader-db results on Haswell:
total instructions in shared programs: 8156608 -> 8155406 (-0.01%)
instructions in affected programs: 164838 -> 163636 (-0.73%)
Shader-db results on Skylake:
total instructions in shared programs: 18227370 -> 18227359 (<.01%)
instructions in affected programs: 519 -> 508 (-2.12%)
helped: 6
HURT: 0
Shader-db results on Tigerlake:
total instructions in shared programs: 21167987 -> 21168025 (<.01%)
instructions in affected programs: 23701 -> 23739 (0.16%)
helped: 21
HURT: 27
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16817>
I copy-and-pasted one of these and people noted that we had a better tool,
so make sure nobody else copy and pastes it.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17664>
The nir_opt_conditional_discard pass is called anyway and covers
discard/demote/terminate.
iris shader-db:
total instructions in shared programs: 8933422 -> 8933426 (<.01%)
instructions in affected programs: 48 -> 52 (8.33%)
helped: 0
HURT: 4
which is a synmark shader going from 12 to 13 instrs.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17664>
This pattern almost always gets peephole-selected out anyway, but I
noticed it once I removed glsl opt_conditional_discard.
iris shader-db:
total instructions in shared programs: 8933934 -> 8933158 (<.01%)
instructions in affected programs: 75575 -> 74799 (-1.03%)
helped: 179
HURT: 15
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17664>