Currently, ir3 backend compiler is lowering integer multiplication from:
dst = a * b
to:
dst = (al * bl) + (ah * bl << 16) + (al * bh << 16)
by emitting this code:
mull.u tmp0, a, b ; mul low, i.e. al * bl
madsh.m16 tmp1, a, b, tmp0 ; mul-add shift high mix, i.e. ah * bl << 16
madsh.m16 dst, b, a, tmp1 ; i.e. al * bh << 16
which at that point has very low chances of being optimized.
This patch adds a new nir_algebraic.AlgebraicPass to performs this
lowering during NIR algebraic optimization passes, giving it a better
chance for optimizing the resulting code.
Reviewed-by: Eric Anholt <eric@anholt.net>
We originally had a single lower_fmod option. In commit 2ab2d2e5, Sam
split 32 and 64-bit lowering into separate flags, with the rationale
that some drivers might want different options there. This left 16-bit
unhandled, so Iago added a lower_fmod16 option in commit ca31df6f.
Now that lower_fmod64 is gone (in favor of nir_lower_doubles and
nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a
single lower_fmod flag again. I'm not aware of any hardware which
need lowering for one bitsize and not the other.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Use a helper function to get the sysval/attribute/varying/output name
and make the disam debug output independent of shader stage.
Reviewed-by: Rob Clark <robdclark@gmail.com>
We have the GLSL type, so we can just ask it how many coordinates there
are. The GLSL function already has Vulkan cases that we'd probably want
eventually.
Reviewed-by: Rob Clark <robdclark@gmail.com>
When comparing our sin/cos behavior to the closed source driver, I
noticed that we were off by a bit (or, in the case of 1/2pi, 3 bits).
Fixes:
dEQP-GLES3.functional.shaders.random.trigonometric.vertex.52
dEQP-GLES3.functional.shaders.random.all_features.vertex.0
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
../src/freedreno/vulkan/tu_device.c:900:4: error: initializer element is not constant
.minImageTransferGranularity = (VkExtent3D) { 1, 1, 1 },
^
Suggested-by: Kristian Høgsberg <krh@bitplanet.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110698
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Previously the loop for assigning registers was bailing out early if
the register had a null source. I think the intention is that in this
case it isn’t necessary to assign a register. However it was also
missing out the part to fix up the types. This can happen if the
instruction is copy propagated to be a move from a constant half-float
input register. In that case it still needs to fix up the types.
Fixes assert in
dEQP-GLES3.functional.shaders.invariance.highp.subexpression_precision_mediump
when lowering the precision of the variables.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Though it should be fixed in RA pass, it needs to be set correctly from
the beginning according to the bitsize of NIR dest.
v2: Would be better for mad,fddx,fddy to fixup later in RA pass.
[small cleanup of fallout from imov/fmov removal fallout]
Signed-off-by: Rob Clark <robdclark@chromium.org>
It seems to handle only 32-bit values for half constant registers
within floating point opcodes according to the blob driver.
So we need to convert back to 32-bit values from 16-bit values, when a
lower precision pass is in effect.
Signed-off-by: Rob Clark <robdclark@chromium.org>
If the type of dest reg and src reg of absneg opcode are different,
it shouldn't be considered as same type mov.
This patch becomes meaningful when we start to use mediump information for
doing precision lowering to 16bit.
Signed-off-by: Rob Clark <robdclark@chromium.org>
eg. uniform mediump vec4 f;
This patch means nothing since there's no mediump lowering pass for now,
but will be meaningful when the pass land in the near future.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Previously the code to load from a constant instruction was always
using the u32 pointer. If the constant is actually a 16-bit source
this would end up with the wrong values because the pointer would be
offset by the wrong size. This fixes it to use the u16 pointer.
Signed-off-by: Rob Clark <robdclark@chromium.org>
The aren't real instructions, and don't change # of live values, so no
point in them competing with real instructions.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For instructions that increase the # of live values, apply a threshold
to avoid scheduling them too early. And factor the net change of # of
live values that would result from scheduling an instruction, to
prioritize instructions that reduce number of live values as the number
of live values increases.
For manhattan:
total instructions in shared programs: 27869 -> 28413 (1.95%)
instructions in affected programs: 26756 -> 27300 (2.03%)
helped: 102
HURT: 87
total full in shared programs: 1903 -> 1719 (-9.67%)
full in affected programs: 1390 -> 1206 (-13.24%)
helped: 124
HURT: 9
The reduction in register usage nets ~20% gain in manhattan. (So
getting mediump support should be a huge win for gles gfxbench.)
Also significantly helps some of the more complex shadertoy shaders,
like IQ's Piano (32 to 18 regs, doubles fps).
The effect is less pronounced on smaller shaders.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Account for shader outputs and values live in any direct/indirect
successor block.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
If we access the address of the UBO indirectly, and there is no higher
const emitted w/ direct access (like an immediate lowered to uniform)
the assembler won't figure out the correct constlen.
Fixes:
dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_vertex
dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_fragment
dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_vertex
dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_fragment
Signed-off-by: Rob Clark <robdclark@chromium.org>
Blob is also setting the .l bit, and it seems to solve some intermittent
failures with a couple of deqp's:
dEQP-GLES31.functional.image_load_store.2d.qualifiers.coherent_r32i
dEQP-GLES31.functional.image_load_store.2d.qualifiers.volatile_r32f
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
It seems like (ei) handling doesn't sync on (ss), so we could end up in
a situation where we release varying storage before an ldlv for flat
shaded varyings completes. Keep track if we've done an (ss) since the
last ldlv, and if not add (ss) flag to last_input which gets (ei).
Noticed with dEQP-GLES3.functional.fragment_out.random.24 and
dEQP-GLES3.functional.fragment_out.random.27, which previously passed by
luck because ir3_sched ordered instructions in a way that resulted in a
lucky (ss).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
The special handling for last_input assumes that all the varying loads
are in the first block. Add an assert to catch if anyone breaks that
assumption.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
ncomp is never set for vertex shaders, but a3xx and a4xx still use it.
Fixes: 831f1a05c0 freedreno/ir3: rework varying packing
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
spirv_to_nir() returned the nir_function corresponding to the
entrypoint, as a way to identify it. There's now a bool is_entrypoint
in nir_function and also a helper function to get the entry_point from
a nir_shader.
The return type reflects better what the function name suggests. It
also helps drivers avoid the mistake of reusing internal shader
references after running NIR_PASS on it. When using NIR_TEST_CLONE or
NIR_TEST_SERIALIZE, those would be invalidated right in the first pass
executed.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Replace its uses with nir_shader_get_entrypoint(), and change the
helper function to return nir_shader *.
This is a preparation to change spirv_to_nir() return type.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The difference between imov and fmov has been a constant source of
confusion in NIR for years. No one really knows why we have two or when
to use one vs. the other. The real reason is that they do different
things in the presence of source and destination modifiers. However,
without modifiers (which many back-ends don't have), they are identical.
Now that we've reworked nir_lower_to_source_mods to leave one abs/neg
instruction in place rather than replacing them with imov or fmov
instructions, we don't need two different instructions at all anymore.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Rob Clark <robdclark@chromium.org>
On machines with many cores, you can run into that issue :
../mesa-9999/src/vulkan/overlay-layer/overlay.cpp:42:10: fatal error: vk_enum_to_str.h: No such file or directory
v2: Move declare_dependency around (Eric)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Jan Ziak
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
shader-db's report.py will use this to see when we've changed loop
unrolling behavior on a shader and skip including other stats like
instruction count from being considered for that shader, since they won't
be useful as a proxy for real world performance in that case.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eduardo Lima Mitev <elima@igalia.com>
__u64 is a ulonglong on x86_64, not uint64_t, so my gcc was complaining
about the wrong type being passed in.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
The .editorconfig helps with the tabs, but we've got this
two-tabs-from-previous-indentation line continuation style that requires
whacking the c-file-offsets. This will throw emacs warnings when first
opening a file in the directory, press '!' to shut it up for the future.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
The editorconfig takes precedence over dir-locals in emacs26 with
editorconfig enabled, so the /.editorconfig was affecting these
directories.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This can be used by both etnaviv and freedreno/a2xx as they are both vec4
architectures with some instructions being scalar-only.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ofc legacy gl features that are broken don't trigger fails in deqp. I
should remember to test glxgears more often.
Fixes: 7ff6705b8d freedreno/ir3: convert to "new style" frag inputs
Signed-off-by: Rob Clark <robdclark@chromium.org>
For a6xx, we construct/emit a single VS const state used for both
binning pass and draw pass. So far we were mostly getting lucky that
there were not (obvious) mismatches between the const_state (like
different lowered immediates) between the binning and draw pass
VS ir3_shader_variant.
And I guess this situation will come up more as GS and tess is added
into the equation.
Since really everything about the const state is not specific to the
variant, move this. The main exception is lowered immediates, but these
are the last to appear in the layout, and it doesn't hurt for each new
shader variant to just append any immed's it lowers to the end of the
immediate state.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Next patch moves const_state to ir3_shader, before the compile context
is created. So move the code around in prep to call it earlier.
Signed-off-by: Rob Clark <robdclark@chromium.org>
They are really part of the constant state, and it will moving things
from ir3_shader_variant to ir3_shader if we combine them.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Combine the offsets of differenet parts of the constant space with (what
was formerly known as) ir3_driver_const_layout. Bunch of churn, but no
functional change.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Move to ir3_compiler so it doesn't depend on the compile context. Prep
work for moving constant state from variant (where we have compile
context) to shader (where we do not).
Signed-off-by: Rob Clark <robdclark@chromium.org>
I tried to be very careful while updating all the various drivers, but I
don't have any of that hardware for testing. :(
i965 is the only platform that sets always_precise = true, and it is
only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32
only for vertex shaders. For fragment shaders, nir_op_flrp is lowered
during code generation as a(1-c)+bc. On all other platforms 64-bit
nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
nir_opt_algebraic method.
No changes on any other Intel platforms.
v2: Add panfrost changes.
Iron Lake and GM45 had similar results. (Iron Lake shown)
total cycles in shared programs: 188647754 -> 188647748 (<.01%)
cycles in affected programs: 5096 -> 5090 (-0.12%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
Reviewed-by: Matt Turner <mattst88@gmail.com>
Driver which do not support native integers should use a lowering
pass to go from integers to floats.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In commit 2f0b9d2249 ("freedreno/ir3: lower
load_barycentric_at_offset") a new file was added that needs to
also be added to the Makefile.sources list used by Android and
SCons build system.
Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: 2f0b9d2249 ("freedreno/ir3: lower load_barycentric_at_offset")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Add libfreedreno_drm/ir3 to the build
Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: b4476138d5 ("freedreno: move drm to common location")
Fixes: aa0fed10d3 ("freedreno: move ir3 to common location")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[jstultz: Tweaked to add extra ir3 files from master]
Signed-off-by: John Stultz <john.stultz@linaro.org>