The state value of main_uniform_storage_index will be wrong for
add_parameter() when find_and_update_previous_uniform_storage()
finds a uniform if there is more than 1 uniform used in
multiple shader stages.
The new code is also simpler.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The current implementation was broken for any integers between 2^24
and 2^30 (it would return zero for me on ICL). The reason is that for
such integers we wouldn't take the 'if (0 <= shiftCount)' early return
path, however 'shiftCount + 7' would be positive, leading to a
negative 'count' argument passed to __shift64RightJamming(), which
would give undefined results.
This reworks the affected conversion functions to use either
__shortShift64Left() or __shift64RightJamming() based on the sign of
the final shift count, which should avoid the problem. In addition
this should qualify as a clean-up/optimization -- This implementation
of the conversion functions translates to 7 instructions less than the
original on Intel hardware.
This fixes the 'KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot'
conformance tests on soft fp64 hardware with large enough subgroup
size (>16).
Fixes: d5cf6e92b4 "glsl: Add built-in functions to do uint64_to_fp32(uint64_t)"
Fixes: c9d333a6b7 "glsl: Add built-in functions to do int64_to_fp32(int64_t)"
Cc: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
SPV_AMD_shader_image_load_store_lod allows to use a lod parameter
with OpImageRead, OpImageWrite and OpImageSparseRead.
According to the specification, this parameter should be a 32-bit
integer. It is initialized to 0 when no lod parameter is found
during SPIR-V->NIR translation.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
New SPIR-V capability for SPV_AMD_shader_image_load_store_lod.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Only the blocks that are reachable are inserted with an end_nop
instruction at the end.
When handling the Phi second pass, if the Phi has a parent block that
does not have an end_nop then it means this block is unreachable, and
thus we can ignore it, as the Phi will never come through it.
Fixes dEQP-VK.graphicsfuzz.uninit-element-cast-in-loop.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This is adapted from the GLSL IR code but doesn't need to
iterate over the IR. I believe this also fixes a potential bug in
the GLSL IR code which potentially counts the same output twice.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This will allow us to do some linking in NIR that was previously
done by the GLSL IR linker. To start with this just has calls for
linking atomics.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This is pretty much a copy of link_check_atomic_counter_resources()
updated to work with the NIR linker.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
A NIR based glsl linking function will be too different to the
spirv version to bother attempting any sharing. So lets change
the name to be explicit.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
When SSBO array is used with packed layout, both IR tree
and as a result, NIR tree will be incorrect.
In fact, the SSBO dereference indices won't
match the array size in some cases like the following:
"layout(packed, binding=1) buffer SSBO { vec4 a; } ssbo[3];
out vec4 color;
void main() {
color = ssbo[2].a;
}"
After linking the IR and then NIR will have an SSBO array
definition with size 1 but dereference still will have index 2
and linked_shader->Program->sh.ShaderStorageBlocks
will contain just SSBO with name "SSBO[2]"
So this line should be removed at least as a workaround for now
to avoid error like:
Failed to find the block by name "SSBO[0]"
Fixes: 810dde2a "glsl/nir: Add a pass to lower UBO and SSBO access"
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This is needed to be in agreement with spec requirements:
https://github.com/KhronosGroup/OpenGL-API/issues/46
Piers Daniell:
"We discussed this in the OpenGL/ES working group meeting
and agreed that eliminating unused elements from the interface
block array is not desirable. There is no statement in the spec
that this takes place and it would be highly implementation
dependent if it happens. If the application has an "interface"
in the shader they need to match up with the API it would be
quite confusing to have the binding point get compacted.
So the answer is no, the binding points aren't affected by
unused elements in the interface block array."
v2: - 'original_dim_size' field moved above to keep
the struct packed better on 64-bit
- added a comment for 'total_num_array_elements' field
- fixed a binding point calculations for SSBOs array of arrays
( Ian Romanick <ian.d.romanick@intel.com> )
- fixed binding point calculations for non-packed SSBOs
v3:
- rename 'total_num_array_elements' to 'aoa_size'
( Jason Ekstrand <jason@jlekstrand.net> )
- rename 'boffset' to 'binding_stride'
( Alejandro Piñeiro <apinheiro@igalia.com> )
Fixes: 8cf1333b "glsl: link uniform block arrays of arrays"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109532
Reported-By: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Fritz Koenig <frkoenig@google.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This is needed to fix these tests:
piglit.spec.arb_shader_storage_buffer_object.compiler.unused-array-element_frag
piglit.spec.arb_shader_storage_buffer_object.compiler.unused-array-element_comp
Fixes: 8cf1333b "glsl: link uniform block arrays of arrays"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109532
Reported-By: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Fritz Koenig <frkoenig@google.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Fixes: 624789e370 "compiler/glsl: handle case where we have multiple users for types"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Etnaviv also does the same thing, so let's try to avoid repetition here,
and use the same for it code as well.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Paul Cercueil <paul@crapouillou.net>
i965 and iris use inputs_read/outputs_written for a shader stage to
determine the layout of input and output storage. Adjacent stages must
agree on the layout, so adjacent input/output bitfields must match.
This patch adds a new nir_shader_compiler_options::unify_interfaces
flag which asks the linker to unify the input/output interfaces between
adjacent stages.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3249>
These were missed when the fields got added. Added it everywhere where
texture_index got used and it made sense.
Found this in "The Surge 2", where the inliner does not copy the fields,
resulting in corruption and hangs.
Fixes: 3bd5457641 "nir: Add a lowering pass for non-uniform resource access"
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1203
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3246>
This introduces new vec8 and vec16 instructions (which are the only
instructions taking more than 4 sources), in order to construct 8 and 16
component vectors.
In order to avoid fixing up the non-autogenerated nir_build_alu() sites
and making them pass 16 src args for the benefit of the two instructions
that take more than 4 srcs (ie vec8 and vec16), nir_build_alu() is has
nir_build_alu_tail() split out and re-used by nir_build_alu2() (which is
used for the > 4 src args case).
v2 (Karol Herbst):
use nir_build_alu2 for vec8 and vec16
use python's array multiplication syntax
add nir_op_vec helper
simplify nir_vec
nir_build_alu_tail -> nir_builder_alu_instr_finish_and_insert
use nir_build_alu for opcodes with <= 4 sources
v3 (Karol Herbst):
fix nir_serialize
v4 (Dave Airlie):
fix serialization of glsl_type
handle vec8/16 in lowering of bools
v5 (Karol Herbst):
fix load store vectorizer
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Currently when lowering mod() we add an extra instruction so if
mod(a,b) == b then 0 is returned instead of b, as mathematically
mod(a,b) is in the interval [0, b).
But Vulkan spec has relaxed this restriction, and allows the result to
be in the interval [0, b].
This commit takes this in account to remove the extra instruction
required to return 0 instead.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2922>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2922>
When geometry shaders write a value to gl_Layer that doesn't correspond to
an existing layer in the target framebuffer the rendering behavior is
undefined according to the spec, however, there are CTS tests that trigger
this scenario on purpose, probably to ensure that nothing terrible happens.
For V3D, this situation is problematic because the binner uses the layer
index to select the offset to write into the tile state data, and we only
allocate tile state for MAX2(num_layers, 1), so we want to make sure we
don't produce values that would lead to out of bounds writes. The simulator
has an assert to catch this, although we haven't observed issues in actual
hardware it is probably best to play safe.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Here we use the NIR based builder to add everything to the resource
list execpt for SSO packed varyings. Since the details of those
varyings get lost during packing we leave the special handing to
the GLSL IR pass for now. In order to do this we add some bools
to the build resource list functions.
Using the NIR based resource list builder gets us a step closer to
using a native NIR based linker. It should also be faster than the
GLSL IR builder, one because the NIR optimisations should mean we
add less entries due to better optimisations, and two because nir
gives us better lists to work with and we don't need to walk the
entire IR to find the resources.
Ack-by: Alejandro Piñeiro <apinheiro@igalia.com>
This adds support for adding names of varying to the resource list
which is required for us to use this function with the glsl linker.
Support for names is optional for spirv which is why it had not been
added yet.
This is mostly a copy of the GLSL IR code adapted to nir.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
In order to be able to implement a NIR based glsl linker we need to
build the program resource list with NIR. This change delays the
remaping so that a later commit can call the NIR based resource
list builder.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
It's undefined behavior UBSAN complains about, so fixing this will
reduce the noise a bit.
../src/compiler/nir/nir_clone.c:710:4: runtime error: null pointer passed as argument 2, which is declared to never be null"}
#0 0xac781be4 in clone_function ../src/compiler/nir/nir_clone.c:710"}
#1 0xac781be4 in nir_shader_clone ../src/compiler/nir/nir_clone.c:740"}
#2 0xacf99442 in panfrost_shader_compile ../src/gallium/drivers/panfrost/pan_assemble.c:54"}
#3 0xacf6b268 in panfrost_bind_shader_state ../src/gallium/drivers/panfrost/pan_context.c:1960"}
#4 0xaae326bc in set_fragment_shader ../src/mesa/state_tracker/st_cb_clear.c:135"}
#5 0xaae326bc in clear_with_quad ../src/mesa/state_tracker/st_cb_clear.c:335"}
#6 0xaae326bc in st_Clear ../src/mesa/state_tracker/st_cb_clear.c:518"}
#7 0x494d0e in deqp::gles2::TestCaseWrapper::iterate(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x2ad0e)"}
#8 0x7f9cf2 in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x38fcf2)"}
#9 0x7fa5f0 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x3905f0)"}
#10 0x7e1aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
#11 0x492d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
#12 0xb64b9aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This adds nir encoding for these, generating them from libclc
was very expensive, and this is a lot simpler.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
There is an alignment issue doing this the other way, the
spec clearly says vload/store don't require alignment.
Reviewed-by: Karol Herbst <kherbst@redhat.com>