The alignment can just be copied from the source intrinsic.
Fixes the assertion
nir_intrinsic_align_offset(instr) < nir_intrinsic_align_mul(instr)
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5949>
Fixes the two-sided-lighting and vertex-program-two-side piglit tests
on Panfrost.
v2: Use an existing variable for gl_FrontFacing if present.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Tested-by: Urja Rannikko <urjaman@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5915>
This is needed for handling drivers that use an input for loading the
face, for example Panfrost with Midgard GPUs.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Tested-by: Urja Rannikko <urjaman@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5915>
EXT_shader_framebuffer_fetch requires the fetched value to be per-sample, so we
need to load the sample id when in a fragment shader.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5930>
Whenever a struct type is decorated Block or BufferBlock we turn that
into a GLSL_TYPE_INTERFACE. Since these decorations can end up random
places, we should allow them for constants.
Closes: #3252
Fixes: 9d0ae777dd "spirv: Use interface type for block and buffer..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5855>
the final output of gl_Position needs this transform, and geometry shaders
must write this value for stream 0 if rasterization is enabled
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5851>
In the case where SSA use/def chains are broken, NIR prints out a very
cryptic error and then aborts. This abort happens during validation
rather than after the print is complete, hiding any other errors that
may have been found. One might think, "So what? Fix your use/def issue
first." However, what makes this especially bad is that, when use/def
chains are broken, there's usually a much nicer error inline in the
shader that would have been printed had we not aborted early so the
current behavior simply ensures you get the most cryptic error possible
in an already difficult-to-debug case.
While we're at it, we remove the one other case of abort() which is in
the validation of phi instruction sources.
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5809>
this is needed for zink and other drivers which can support fragcolor but
not fragdata and want to correctly handle EXT_multiview_draw_buffers
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5687>
The first intrinsic is intended to expose the value set by glLineWidth
to shaders internally. The second intrinsic exposes the value actually
sent to the hardware. This may be wider than the first one in order to
implement anti-aliasing. These will be used in later patches to
implement a line smoothing lowering pass.
v2: Add a second intrinsic for the expanded line width for
anti-aliasing.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>
The line coord is a coordinate along the axis perpendicular to the line.
It is in the range [0,1] between the two edges of the line. It is
available at least on Broadcom hardware.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>
No drivers are using this anymore so we can delete it and not keep
maintaining this legacy code-path. If any drivers want this in the
future, they should use nir_lower_varst_to_explicit_types followed by
nir_lower_explicit_io.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5418>
For turnip, we use the "bindless" model on a6xx. Loads and stores with
the bindless model require a bindless base, which is an immediate field
in the instruction that selects between 5 different 64-bit "bindless
base registers", a 32-bit descriptor index that's added to the base, and
the usual 32-bit offset. The bindless base usually, but not always,
corresponds to the Vulkan descriptor set. We can handle the case where
the base is non-constant by using a bunch of if-statements, to make it a
little easier in core NIR, and this seems to be what Qualcomm's driver
does too. Therefore, the pointer format we need to use in NIR has a vec2
index, for the bindless base and descriptor index. Plumb this format
through core NIR.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5683>
Add the possibility to specify the source components. This is necessary
to let the UBO/SSBO index have more than one component, and it also lets
us remove a few hand-rolled load intrinsic definitions.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5683>
a pass which rewrites gl_ClipDistance[n] to an undef if the corresponding
clip plane is disabled in the rasterizer state
this pass is needed for zink to handle api disables of clip planes
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5529>
Dot product is multiplication followed by addition, and absolute value
does not distribute into addition.
Only vec4 platforms are affected by this change as scalar-only platforms
never have any of the fdot_replicated instructions. In the shader-db
results, below, shaders in MANY different applications are affected.
Trine, Doom3, Enemy Territory: Quake Wars, Counter Strike: Global
Offensive, Mad Max, Metro Last Light, and on and on... I'm really
shocked that there were no test regressions!
All Haswell and earlier platforms had similar results. (Haswell shown)
total instructions in shared programs: 16219743 -> 16219820 (<.01%)
instructions in affected programs: 12171 -> 12248 (0.63%)
helped: 1
HURT: 78
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.78% max: 0.78% x̄: 0.78% x̃: 0.78%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.35% max: 2.38% x̄: 0.91% x̃: 1.06%
95% mean confidence interval for instructions value: 0.92 1.03
95% mean confidence interval for instructions %-change: 0.78% 1.00%
Instructions are HURT.
total cycles in shared programs: 538481383 -> 538491045 (<.01%)
cycles in affected programs: 470796 -> 480458 (2.05%)
helped: 149
HURT: 142
helped stats (abs) min: 1 max: 1338 x̄: 71.13 x̃: 4
helped stats (rel) min: 0.06% max: 40.99% x̄: 2.76% x̃: 0.67%
HURT stats (abs) min: 1 max: 2092 x̄: 142.68 x̃: 12
HURT stats (rel) min: 0.07% max: 55.38% x̄: 5.07% x̃: 1.07%
95% mean confidence interval for cycles value: -5.28 71.69
95% mean confidence interval for cycles %-change: -0.07% 2.19%
Inconclusive result (value mean confidence interval includes 0).
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: 62795475e8 ("nir/algebraic: Distribute source modifiers into instructions")
Closes: #3129
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5581>
If we have code like:
('f2f16', ('vec2', ('f2f32', 'a@16'), '#b@32'))
We would like to eliminate the conversions, but the existing rules can't
see into the the (heterogenous) vector. So instead of trying to
eliminate in one pass, we add opts to propagate the f2f16 into the
vector. Even if nothing further happens, this is often a win since then
the created vector is smaller (half2 instead of float2). Hence the above
gets transformed to
('vec2', ('f2f16', ('f2f32', 'a@16')), ('f2f16', '#b@32'))
Then the existing f2f16(f2f32) rule will kick in for the first component
and constant folding will for the second and we'll be left with
('vec2', 'a@16', '#b@16')
...eliminating all conversions.
v2: Predicate on !options->vectorize_vec2_16bit. As discussed, this
optimization helps greatly on true vector architectures (like Midgard)
but wreaks havoc on more modern SIMD-within-a-register architectures
(like Bifrost and modern AMD). So let's predicate on that.
v3: Extend for integers as well and add a comment explaining the
transforms.
Results on Midgard (unfortunately a true SIMD architecture):
total instructions in shared programs: 51359 -> 50963 (-0.77%)
instructions in affected programs: 4523 -> 4127 (-8.76%)
helped: 53
HURT: 0
helped stats (abs) min: 1 max: 86 x̄: 7.47 x̃: 6
helped stats (rel) min: 1.71% max: 28.00% x̄: 9.66% x̃: 7.34%
95% mean confidence interval for instructions value: -10.58 -4.36
95% mean confidence interval for instructions %-change: -11.45% -7.88%
Instructions are helped.
total bundles in shared programs: 25825 -> 25670 (-0.60%)
bundles in affected programs: 2057 -> 1902 (-7.54%)
helped: 53
HURT: 0
helped stats (abs) min: 1 max: 26 x̄: 2.92 x̃: 2
helped stats (rel) min: 2.86% max: 30.00% x̄: 8.64% x̃: 8.33%
95% mean confidence interval for bundles value: -3.93 -1.92
95% mean confidence interval for bundles %-change: -10.69% -6.59%
Bundles are helped.
total quadwords in shared programs: 41359 -> 41055 (-0.74%)
quadwords in affected programs: 3801 -> 3497 (-8.00%)
helped: 57
HURT: 0
helped stats (abs) min: 1 max: 57 x̄: 5.33 x̃: 4
helped stats (rel) min: 1.92% max: 21.05% x̄: 8.22% x̃: 6.67%
95% mean confidence interval for quadwords value: -7.35 -3.32
95% mean confidence interval for quadwords %-change: -9.54% -6.90%
Quadwords are helped.
total registers in shared programs: 3849 -> 3807 (-1.09%)
registers in affected programs: 167 -> 125 (-25.15%)
helped: 32
HURT: 1
helped stats (abs) min: 1 max: 3 x̄: 1.34 x̃: 1
helped stats (rel) min: 20.00% max: 50.00% x̄: 26.35% x̃: 20.00%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 16.67% max: 16.67% x̄: 16.67% x̃: 16.67%
95% mean confidence interval for registers value: -1.54 -1.00
95% mean confidence interval for registers %-change: -29.41% -20.69%
Registers are helped.
total threads in shared programs: 2471 -> 2520 (1.98%)
threads in affected programs: 49 -> 98 (100.00%)
helped: 25
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.96 x̃: 2
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for threads value: 1.88 2.04
95% mean confidence interval for threads %-change: 100.00% 100.00%
Threads are [helped].
total spills in shared programs: 168 -> 168 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0
total fills in shared programs: 186 -> 186 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4999>
nir_load_store_vectorize_test.ssbo_load_adjacent_32_32_64_64 expectations
need to be fixed accordingly.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5589>
The load_per_vertex_{input,output} intrinsics simply mean that they're
reading an arrayed input/output, which have one element per invocation.
Most accesses to those use gl_InvocationID as the subscript. However,
it's totally possible to read any element of the array. For example,
an evaluation shader might read gl_in[2].gl_Position, or a control
shader might read output[0].
For threads processing a single patch, an input/output load is
convergent if and only if both sources (the per-vertex-array subscript
and the offset) are convergent. For threads processing multiple
patches, we continued to mark them divergent.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5613>
Previously all nir_intrinsic_load_uniform that were used as sources were
considered to be dynamically_uniform but when offsets of load_uniform
are indirect it can not be determined.
This fixes artefacts in Google Maps 3D view in V3D.
Fixes: 886d46b089 ("nir: Add a function to determine if a source is dynamically uniform")
Reviewed-by: Neil Roberts <nroberts@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5587>
The scheduler has code to handle hardware that shares the same memory
for inputs and outputs. Seeing as the specific stages that need this is
probably hardware-dependent, this patch makes it a configurable option
instead of hard-coding it to everything but fragment shaders.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
nir_deps_state is a struct used as a closure for calculating the
dependencies. Previously it had two fields copied out of the scoreboard.
The closure is initialised in two seperate places. In order to make it
easier to access other members of the scoreboard in the callbacks in
later patches, the closure now just contains a pointer to the scoreboard
and the two bits of copied information are removed.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
load_per_vertex_input should probably be handled in the same way as a
regular load_input. I think the nir_schedule pass was written before V3D
had geometry shader support, so that is probably why it hasn’t taken
this into account until now.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
All the other driver-specific intrinsics are at the end of the file so
Intel's should go there too.
Reviewed-by: Sagar Ghuge<sagar.ghuge@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5503>
Validate that num_components is only set for vectorized instructions, to
prevent other nir passes or driver backends from mistakenly relying on
num_components for non-vectorized instructions.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5371>
Of the possible intrinsics generated, only load_ssbo is vectorized (and
store_ssbo is never generated)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5371>