One advantage of this is that we no longer need to run in a loop because
the new framework handles lowering instructions added by lowering.
Reviewed-by: Eric Anholt <eric@anholt.net>
One advantage of this is that we no longer need to run in a loop because
the new framework handles lowering instructions added by lowering.
Reviewed-by: Eric Anholt <eric@anholt.net>
Instead of only lowering system from variables, lower most to intrinsics
and let the lowering framework immediately lower the intrinsic. This
will result in a bit more instruction churn but it means that NIR code
builders can just use intrinsics instead of everything having to go
through variables.
Reviewed-by: Eric Anholt <eric@anholt.net>
Instead of having context-aware builder functions, just provide lowering
for the system value intrinsics and let nir_shader_lower_instructions
handle the recursion for us. This makes everything a bit simpler and
means that the lowering can also be used if something comes in as a
system value intrinsic rather than a load_deref.
Reviewed-by: Eric Anholt <eric@anholt.net>
The stride was already overriden when using
lower_workgroup_access_to_offsets, so elaborate a bit the commentary
there.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Use alignment to calculate the stride associated with the pointer
types. That stride is used when the pointers are casted to arrays.
Note that size alone is not sufficient, e.g. struct { vec2 a; vec1 b;
} will have element an element size of 12 bytes, but the stride needs
to be 16 bytes to respect the 8 byte alignment.
Fixes: 050eb6389a "spirv: Ignore ArrayStride in OpPtrAccessChain for Workgroup"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We need this when doing full software 64-bit emulation.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110309
Fixes: cbad201c2b "nir/algebraic: Add missing 64-bit extract_[iu]8..."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Right now we don't have cache support for SPIR-V shaders (from
ARB_gl_spirv). Right now they are properly skipped because they fall
on the ff shader code path (no key, no name), but it would be better
to update current comments, and add some guards.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Allocate UniformDataDefaults and fill in the data defaults when
linking a SPIR-V program. Among other things, this allows program
serialization to work.
It allows the following piglit test (when run on SPIR-V mode) to pass:
spec/arb_get_program_binary/execution/uniform-after-restore.shader_test
v2: use memcpy to initialize UniformDataDefaults
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
From the ARB_program_interface_query specification:
"For the property TOP_LEVEL_ARRAY_SIZE, a single integer
identifying the number of active array elements of the top-level
shader storage block member containing to the active variable is
written to <params>. If the top-level block member is not
declared as an array, the value one is written to <params>. If
the top-level block member is an array with no declared size, the
value zero is written to <params>."
"For the property TOP_LEVEL_ARRAY_STRIDE, a single integer
identifying the stride between array elements of the top-level
shader storage block member containing the active variable is
written to <params>. For top-level block members declared as
arrays, the value written is the difference, in basic machine
units, between the offsets of the active variable for consecutive
elements in the top-level array. For top-level block members not
declared as an array, zero is written to <params>."
v2: move top_level_array_size and stride into nir_link_uniforms_state
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
ARB_gl_spirv points that the offset must be explicit, however this is
true for 'root' types. For complex types, like struct members or
arrays of arraya, it needs to be computed.
We are not using the offset stored in the gl_buffer_variables during
the uniform blocks linking because currently we do not have a way to
relate a gl_buffer_variable with its corresponding gl_uniform_storage.
The GLSL path uses the name for that, but we can not rely on that
because names are optional in SPIR-V.
Notice that uniforms non-backed by a buffer object will have an offset
equal to -1, like in the GLSL path.
v2: add offset and var_is_in_block as per-variable state in
nir_link_uniforms_state (Arcady)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
v2: added TODO comment hinting possible future refactoring of
nir_build_program_resource_list and build_program_resource_list,
to avoid code duplication (Alejandro, to explicitly reflect a
valid concern from Timothy during the review).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
v2: "nir/linker: Use the stageref when adding UBO/SSBO resources"
squashed on this one (Timothy)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Binding comparison is used to determine the block the uniform is part
of. Note that to do the binding comparison we need the information in
UniformBlocks[] and ShaderStorageBlocks[] to be available, so we have
to call gl_nir_link_uniform_blocks() before linking the uniforms.
v2: add missing break (Timothy)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This was probably not caught before because no supported test was
exercising the flrp lowering with other bit size different than 32.
With the arrival of VK_KHR_shader_float_controls we will have some of
those and, unless we keep the bit size, we will end with something
like:
../src/compiler/nir/nir_builder.h:420: nir_builder_alu_instr_finish_and_insert: Assertion `src_bit_size == bit_size' failed.
Fixes: 158370ed2a ("nir/flrp: Add new lowering pass for flrp instructions")
Fixes: ae02622d8f ("nir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists")
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrnd.net>
This is intended to be used, for example, with OpenGL logic operations. It
takes a render target as source and a sample index in the base index for
MSAA color reads.
v2: drop the CAN_ELIMINATE and CAN_REORDER flags (Eric).
Reviewed-by: Eric Anholt <eric@anholt.net>
A couple patches later in this series use the flag to avoid a few
thousand shader-db regresions on all vec4 platforms.
I'm not particularly enamored with the name of this flag. However, I
suspect the Intel vec4 backend is the only backend that will benefit
from it. Specifically, the cases where this helps are all cases where
we want to prevent nir_opt_algebraic from rearranging instructions to
create 3-source instructions, such as ffma and flrp, with additional
immediate value or uniform sources.
The earlier commit "intel/vec4: Try to emit a single load for multiple
3-src instruction operands" solves most of the problems caused by
additional immediate values, but the restrictions on register strides
that cause problems for uniforms and shader inputs persist.
Reviewed-by: Matt Turner <mattst88@gmail.com>
There are two constructors for glsl_struct_field with different
parameters. Instead of repeating them for both constructors, this
patch adds a convenience macro. This will make it easier to add a
third constructor in a later patch.
Reviewed-by: Eric Anholt <eric@anholt.net>
All of the builtin variables mentioned in the GLSL ES spec and the
extensions include a precision declaration which is different
depending on what the variable is used for. This patch makes it set
the corresponding precision when creating the variable. This will make
a difference once we start using the precision information for
optimisation. Previously all of the builtin variables ended up with a
precision of NONE.
v2: Made gl_PointSize and gl_FragCoord highp since GLSL ES 3.00. Fixed
gl_MaxViewPorts to always be highp. (Eric Anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
Drivers only use lower_io for modes where pointers don't have a
meaningful value, and dereferences can always be traced back to a
variable. But there can be other modes, like global mode with
VK_EXT_buffer_device_address, where pointers cannot be traced back to a
variable, and lower_io would segfault on loads/stores of these since
nir_deref_instr_get_variable() would return NULL.
Just use the mode on the deref itself to filter out these modes before
we try to get the variable.
Fixes: 118a66df99 ("radv: Use NIR barycentric coordinates")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This commit re-plumbs all of nir_loop_analyze to use nir_ssa_scalar for
all intermediate values so that we can properly handle swizzles. Even
though if conditions are required to be scalars, they may still consume
swizzles so you could have ((a.yzw < b.zzx).xz && c.xx).y == 0 as your
loop termination condition. The old code would just bail the moment it
saw its first non-zero swizzle but we can now properly chase the scalar
from the if condition to all the way to a, b, and c.
Shader-db results on Kaby Lake:
total loops in shared programs: 4388 -> 4364 (-0.55%)
loops in affected programs: 29 -> 5 (-82.76%)
helped: 29
HURT: 5
Shader-db results on Haswell:
total loops in shared programs: 4370 -> 4373 (0.07%)
loops in affected programs: 2 -> 5 (150.00%)
helped: 2
HURT: 5
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This commit reworks both get_induction_and_limit_vars() and
try_find_trip_count_vars_in_iand to return true on success and not
modify their output parameters on failure. This makes their callers
significantly simpler.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
There are various cases in which we want to chase SSA values through ALU
ops ranging from hand-written optimizations to back-end translation
code. In all these cases, it can be very tricky to do properly because
of swizzles. This set of helpers lets you easily work with a single
component of an SSA def and chase through ALU ops safely.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
None of the current code knows what to do with swizzles. Take the safe
option for now and bail if we see one. This does have a small shader-db
impact but it is at least safe.
Shader-db results on Kaby Lake:
total loops in shared programs: 4364 -> 4388 (0.55%)
loops in affected programs: 5 -> 29 (480.00%)
helped: 5
HURT: 29
Shader-db results on Haswell:
total loops in shared programs: 4373 -> 4370 (-0.07%)
loops in affected programs: 5 -> 2 (-60.00%)
helped: 5
HURT: 2
Fixes: 6772a17acc "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The current code assumes everything is 32-bit which is very likely true
but not guaranteed by any means. Instead, use nir_eval_const_opcode to
do the calculations in a bit-size-agnostic way. We also use the new
constant constructors to build the correct size constants.
Fixes: 6772a17acc "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
One issue was that the original version didn't check that swizzles
matched when comparing ALU instructions so it could end up matching
very different instructions. Using the nir_instrs_equal function from
nir_instr_set.c which we use for CSE should be much more reliable.
Another was that the loop assumes it will only run two iterations which
may not be true. If there's something which guarantees that this case
only happens for phis after ifs, it wasn't documented.
Fixes: 9e6b39e1d5 "nir: detect more induction variables"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>