If a 3-source instruction uses immediate values 1.0 and -1.0, just load
1.0 into a register. Use the negation source modifier to get -1.0.
This has trivial impact now, but it prevents a few thousand regressions
on vec4 platforms with "nir/algebraic: Recognize open-coded flrp(-1, 1,
a) and flrp(1, -1, a)"
All Gen6 and Gen7 platforms had similar results. (Haswell shown)
total instructions in shared programs: 13487412 -> 13487406 (<.01%)
instructions in affected programs: 541 -> 535 (-1.11%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.36% max: 2.08% x̄: 1.65% x̃: 1.80%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.33% -0.97%
Instructions are helped.
total cycles in shared programs: 376402564 -> 376402500 (<.01%)
cycles in affected programs: 10348 -> 10284 (-0.62%)
helped: 10
HURT: 1
helped stats (abs) min: 2 max: 26 x̄: 7.00 x̃: 2
helped stats (rel) min: 0.13% max: 2.05% x̄: 0.89% x̃: 0.79%
HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6
HURT stats (rel) min: 0.29% max: 0.29% x̄: 0.29% x̃: 0.29%
95% mean confidence interval for cycles value: -11.72 0.08
95% mean confidence interval for cycles %-change: -1.20% -0.36%
Inconclusive result (value mean confidence interval includes 0).
No shader-db changes on any other Intel platform.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Effectivley unused since dd7135d55d ("intel/compiler: Use the flrp
lowering pass for all stages on Gen4 and Gen5"). I had intended to
remove this code as part of that series, but I forgot.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We set header_present but then pass it some random garbage. Give it g0
instead. I'm not actually sure this does anything but g0 is the usual
header data and this is what the windows driver does so it seems like a
good idea.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The difference between imov and fmov has been a constant source of
confusion in NIR for years. No one really knows why we have two or when
to use one vs. the other. The real reason is that they do different
things in the presence of source and destination modifiers. However,
without modifiers (which many back-ends don't have), they are identical.
Now that we've reworked nir_lower_to_source_mods to leave one abs/neg
instruction in place rather than replacing them with imov or fmov
instructions, we don't need two different instructions at all anymore.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Rob Clark <robdclark@chromium.org>
v2: remove & operator in a couple of memsets
add some memsets
v3: fixup lima
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
We need more space than just a 32-bit scalar and we have to burn all
that space anyway so we may as well expose it to the driver. This also
fixes a subtle bug when UBOs and SSBOs have different pointer types.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Drops one instruction from fs-sign-int.shader_test. No change in
shader-db due to it having 0 instances of sign(genIType). This may hurt
isign64 if algebraic runs before int64 lowering, but I wasn't sure how to
mark the algebraic opt as "every bit size but 64".
v2: Update commit message about shader-db.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
The original idea was that the backend compiler could eliminate
surfaces, so we would have it mark which ones are actually used,
then shrink the binding table accordingly. Unfortunately, it's a
pretty blunt mechanism - it can only prune things from the end,
not the middle - since we decide the layout before we even start
the backend compiler, and only limit the size. It also basically
gives up if it sees indirect array access.
Besides, we do the vast majority of our surface elimination in NIR
anyway, not the backend - and I don't see that trend changing any
time soon. Vulkan abandoned this plan a long time ago, and I don't
use it in Iris, but it's still been kicking around in i965.
I hacked shader-db to print the binding table size in bytes, and
observed no changes with this patch. So, this code appears to do
nothing useful.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This is a squash of a bunch of individual changes:
nir/builder: Generate 32-bit bool opcodes transparently
nir/algebraic: Remap Boolean opcodes to the 32-bit variant
Use 32-bit opcodes in the NIR producers and optimizations
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
Use 32-bit opcodes in the NIR back-ends
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is
one if 8, 16, 32, or 64. This leads to having a few more opcodes but
now everything is consistent and booleans aren't a weird special case
anymore.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
../src/intel/compiler/brw_fs_nir.cpp:3534:46: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘int’ [-Wsign-compare]
assert(nir_intrinsic_write_mask(instr) ==
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
(1 << instr->num_components) - 1);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This was caused by 6339aba775 which added these completely valid
checks. However clang likes to complain about signedness mismatches.
Fixes: 6339aba775 "intel/compiler: Lower SSBO and shared..."
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
We have a bunch of code to do this in the back-end compiler but it's
fairly specific to typed surface messages and the way we emit them.
This breaks it out into NIR were it's easier to do things a bit more
generally. It also means we can easily share the code between the vec4
and FS back-ends if we wish.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
As of this commit, all uses of const sources either go through a
nir_src_as_<type> helper which handles bit sizes correctly or else are
accompanied by a nir_src_bit_size() == 32 assertion to assert that we
have the size we think we have.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Everywhere we handle SSBO intrinsics, we have exactly the same pattern
for computing the index so we may as well make a helper for it. We also
add a get_nir_src_imm to vec4 and use it for SSBO offsets.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This isn't true for Vulkan so we have to whack it to "main" in anv which
is silly. Instead of walking the list of functions and asserting that
everything is named "main" and hoping there's only one function named
"main", just use the nir_shader_get_entrypoint() helper which has better
assertions anyway.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is valid NIR but you can't actually hit this case today. GLSL IR
doesn't have a bool to double opcode; it does f2d(b2f(x)). In SPIR-V we
don't have any to/from bool conversion opcodes at all. However, the
next commit will make us start generating it so we should be ready.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
No shader-db or CI changes on any Intel platform.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
No shader-db changes on any Intel platform.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This is achived by copying the sign(abs(x)) optimization from the FS
backend.
On Gen7 an earlier platforms, this fixes new piglit tests:
- glsl-1.10/execution/vs-sign-neg-abs.shader_test
- glsl-1.10/execution/vs-sign-sat-neg-abs.shader_test
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The bug fixed by the previous commit went undetected because extra
stderr messages are not flagged by the CI. Copy the solution from
fs_visitor::nir_emit_instr and mark the default case unreachable.
An alternate solution is to delete the default case so that the compiler
will issue a warning. That may require more work since there are other
(impossible) cases that exist.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
64-bit pull loads are implemented by emitting 2 separate
32-bit pull load messages, where the second message loads from
an offset at +16B.
That addition of 16B to the original offset should not alter the
original offset register used as source for the pull load instruction
though, since the compiler might use that same offset register in other
instructions (for example, for other pull loads in the shader code
that take that same offset as reference).
If the pull load is 32-bit then we only need to emit one message and
we don't need to do offset calculations, but in that case the optimizer
should be able to drop the redundant MOV.
Fixes the following test on Haswell:
KHR-GL45.gpu_shader_fp64.fp64.max_uniform_components
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103007
We use the same hardware mechanism for both atomic counters and SSBO
atomics, so there's really no benefit to maintaining separate code to
handle each case. Instead, we can just use Rob's shiny new NIR pass to
convert atomic_uints to SSBOs, and delete piles of code.
The ssbo_start section of the binding table becomes a combined ABO and
SSBO section, with ABOs first, then SSBOs.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It is already done in NIR.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We set a similar default value for LOD in the fs backend for TXS/TXL.
Without this we end up generating invalid MOV with a null src.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Embarassingly, someone enabled the ARB_shader_atomic_counter_ops
extension for Gen7+ but never added the intrinsics to the switch
statement in the vec4 backend, so they just hit an unreachable()
call and died.
Fixes: 40dd45d0c6 (i965: Enable ARB_shader_atomic_counter_ops)
Cc: "17.2 17.1 17.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It was setting XYWZ swizzle and writemask to all uniforms, no matter if they
were a vector or scalar, so this can lead to problems when loading them
to the push constant buffer.
Moreover, 'shift' calculation was designed to calculate the offset in
DWORDS, but it doesn't take into account DFs, so the calculated swizzle
for the later ones was wrong.
The indirect case is not changed because MOV INDIRECT will write
to all components. Added an assert to verify that these uniforms
are aligned.
v2:
- Fix 'shift' calculation (Curro)
- Set both swizzle and writemask.
- Add assert(shift == 0) for the indirect case.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The only thing still using it is INVOCATION_ID for geometry shaders.
That's easily enough inlined into the nir_intrinsic_load_invocation_id
handling code.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The NIR pass already handles remapping system values to attributes for
us so we delete the system value code as part of the conversion.
We also change nir_lower_vs_inputs to take an explicit inputs_read
bitmask and pass in the inputs_read from prog_data instead from pulling
it out of NIR. This is because the version in prog_data may get
EDGEFLAG added to it on some old platforms.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit e1af20f18a changed the shader_info
from being embedded into being just a pointer. The idea was that
sharing the shader_info between NIR and GLSL would be easier if it were
a pointer pointing to the same shader_info struct. This, however, has
caused a few problems:
1) There are many things which generate NIR without GLSL. This means
we have to support both NIR shaders which come from GLSL and ones
that don't and need to have an info elsewhere.
2) The solution to (1) raises all sorts of ownership issues which have
to be resolved with ralloc_parent checks.
3) Ever since 00620782c9, we've been
using nir_gather_info to fill out the final shader_info. Thanks to
cloning and the above ownership issues, the nir_shader::info may not
point back to the gl_shader anymore and so we have to do a copy of
the shader_info from NIR back to GLSL anyway.
All of these issues go away if we just embed the shader_info in the
nir_shader. There's a little downside of having to copy it back after
calling nir_gather_info but, as explained above, we have to do that
anyway.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Drop useless vec4_visitor dependencies. Demote to
static stand-alone function. Don't write unused components in the
result. Use vec4_builder interface for register allocation. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This way we can set the destination type as double to all these new opcodes,
avoiding any optimizer's confusion that was happening before.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Drop no_spill workaround originally needed due to
the bogus destination type of VEC4_OPCODE_FROM_DOUBLE. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When doing a 64-bit to a smaller data type size conversion, the destination should
be aligned to 64-bits. Because of that, we need to gather the data after the
actual conversion.
Until now, these two operations were done by VEC4_OPCODE_FROM_DOUBLE but
now we split them explicitely in two different instructions:
VEC4_OPCODE_FROM_DOUBLE just do the conversion and
VEC4_OPCODE_PICK_LOW_32BIT will gather the data.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The NIR story on conversion opcodes is a mess. We've had way too many
of them, naming is inconsistent, and which ones have explicit sizes was
sort-of random. This commit re-organizes things and makes them all
consistent:
- All non-bool conversion opcodes now have the explicit size in the
destination and are named <src_type>2<dst_type><size>.
- Integer <-> integer conversion opcodes now only come in i2i and u2u
forms (i2u and u2i have been removed) since the only difference
between the different integer conversions is whether or not they
sign-extend when up-converting.
- Boolean conversion opcodes all have the explicit size on the bool and
are named <src_type>2<dst_type>.
Making things consistent also allows nir_type_conversion_op to be moved
to nir_opcodes.c and auto-generated using mako. This will make adding
int8, int16, and float16 versions much easier when the time comes.
Reviewed-by: Eric Anholt <eric@anholt.net>
Mostly a dummy git mv with a couple of noticable parts:
- With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
- Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
- brw_util.[ch] is not really compiler specific, so it's moved to i965.
v2:
- move brw_eu_defines.h instead of brw_defines.h
- remove no-longer applicable includes
- add missing vulkan/ prefix in the Android build (thanks Tapani)
v3:
- don't list brw_defines.h in src/intel/Makefile.sources (Jason)
- rebase on top of the oa patches
[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00
Renamed from src/mesa/drivers/dri/i965/brw_vec4_nir.cpp (Browse further)