The first intrinsic is intended to expose the value set by glLineWidth
to shaders internally. The second intrinsic exposes the value actually
sent to the hardware. This may be wider than the first one in order to
implement anti-aliasing. These will be used in later patches to
implement a line smoothing lowering pass.
v2: Add a second intrinsic for the expanded line width for
anti-aliasing.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>
The line coord is a coordinate along the axis perpendicular to the line.
It is in the range [0,1] between the two edges of the line. It is
available at least on Broadcom hardware.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>
No drivers are using this anymore so we can delete it and not keep
maintaining this legacy code-path. If any drivers want this in the
future, they should use nir_lower_varst_to_explicit_types followed by
nir_lower_explicit_io.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5418>
For turnip, we use the "bindless" model on a6xx. Loads and stores with
the bindless model require a bindless base, which is an immediate field
in the instruction that selects between 5 different 64-bit "bindless
base registers", a 32-bit descriptor index that's added to the base, and
the usual 32-bit offset. The bindless base usually, but not always,
corresponds to the Vulkan descriptor set. We can handle the case where
the base is non-constant by using a bunch of if-statements, to make it a
little easier in core NIR, and this seems to be what Qualcomm's driver
does too. Therefore, the pointer format we need to use in NIR has a vec2
index, for the bindless base and descriptor index. Plumb this format
through core NIR.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5683>
Add the possibility to specify the source components. This is necessary
to let the UBO/SSBO index have more than one component, and it also lets
us remove a few hand-rolled load intrinsic definitions.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5683>
a pass which rewrites gl_ClipDistance[n] to an undef if the corresponding
clip plane is disabled in the rasterizer state
this pass is needed for zink to handle api disables of clip planes
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5529>
Dot product is multiplication followed by addition, and absolute value
does not distribute into addition.
Only vec4 platforms are affected by this change as scalar-only platforms
never have any of the fdot_replicated instructions. In the shader-db
results, below, shaders in MANY different applications are affected.
Trine, Doom3, Enemy Territory: Quake Wars, Counter Strike: Global
Offensive, Mad Max, Metro Last Light, and on and on... I'm really
shocked that there were no test regressions!
All Haswell and earlier platforms had similar results. (Haswell shown)
total instructions in shared programs: 16219743 -> 16219820 (<.01%)
instructions in affected programs: 12171 -> 12248 (0.63%)
helped: 1
HURT: 78
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.78% max: 0.78% x̄: 0.78% x̃: 0.78%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.35% max: 2.38% x̄: 0.91% x̃: 1.06%
95% mean confidence interval for instructions value: 0.92 1.03
95% mean confidence interval for instructions %-change: 0.78% 1.00%
Instructions are HURT.
total cycles in shared programs: 538481383 -> 538491045 (<.01%)
cycles in affected programs: 470796 -> 480458 (2.05%)
helped: 149
HURT: 142
helped stats (abs) min: 1 max: 1338 x̄: 71.13 x̃: 4
helped stats (rel) min: 0.06% max: 40.99% x̄: 2.76% x̃: 0.67%
HURT stats (abs) min: 1 max: 2092 x̄: 142.68 x̃: 12
HURT stats (rel) min: 0.07% max: 55.38% x̄: 5.07% x̃: 1.07%
95% mean confidence interval for cycles value: -5.28 71.69
95% mean confidence interval for cycles %-change: -0.07% 2.19%
Inconclusive result (value mean confidence interval includes 0).
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: 62795475e8 ("nir/algebraic: Distribute source modifiers into instructions")
Closes: #3129
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5581>
In a piglit run on s390 a lot of double tests fail, explicitly
packing/shifting things rather than using memcpy seems to help
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5679>
If we have code like:
('f2f16', ('vec2', ('f2f32', 'a@16'), '#b@32'))
We would like to eliminate the conversions, but the existing rules can't
see into the the (heterogenous) vector. So instead of trying to
eliminate in one pass, we add opts to propagate the f2f16 into the
vector. Even if nothing further happens, this is often a win since then
the created vector is smaller (half2 instead of float2). Hence the above
gets transformed to
('vec2', ('f2f16', ('f2f32', 'a@16')), ('f2f16', '#b@32'))
Then the existing f2f16(f2f32) rule will kick in for the first component
and constant folding will for the second and we'll be left with
('vec2', 'a@16', '#b@16')
...eliminating all conversions.
v2: Predicate on !options->vectorize_vec2_16bit. As discussed, this
optimization helps greatly on true vector architectures (like Midgard)
but wreaks havoc on more modern SIMD-within-a-register architectures
(like Bifrost and modern AMD). So let's predicate on that.
v3: Extend for integers as well and add a comment explaining the
transforms.
Results on Midgard (unfortunately a true SIMD architecture):
total instructions in shared programs: 51359 -> 50963 (-0.77%)
instructions in affected programs: 4523 -> 4127 (-8.76%)
helped: 53
HURT: 0
helped stats (abs) min: 1 max: 86 x̄: 7.47 x̃: 6
helped stats (rel) min: 1.71% max: 28.00% x̄: 9.66% x̃: 7.34%
95% mean confidence interval for instructions value: -10.58 -4.36
95% mean confidence interval for instructions %-change: -11.45% -7.88%
Instructions are helped.
total bundles in shared programs: 25825 -> 25670 (-0.60%)
bundles in affected programs: 2057 -> 1902 (-7.54%)
helped: 53
HURT: 0
helped stats (abs) min: 1 max: 26 x̄: 2.92 x̃: 2
helped stats (rel) min: 2.86% max: 30.00% x̄: 8.64% x̃: 8.33%
95% mean confidence interval for bundles value: -3.93 -1.92
95% mean confidence interval for bundles %-change: -10.69% -6.59%
Bundles are helped.
total quadwords in shared programs: 41359 -> 41055 (-0.74%)
quadwords in affected programs: 3801 -> 3497 (-8.00%)
helped: 57
HURT: 0
helped stats (abs) min: 1 max: 57 x̄: 5.33 x̃: 4
helped stats (rel) min: 1.92% max: 21.05% x̄: 8.22% x̃: 6.67%
95% mean confidence interval for quadwords value: -7.35 -3.32
95% mean confidence interval for quadwords %-change: -9.54% -6.90%
Quadwords are helped.
total registers in shared programs: 3849 -> 3807 (-1.09%)
registers in affected programs: 167 -> 125 (-25.15%)
helped: 32
HURT: 1
helped stats (abs) min: 1 max: 3 x̄: 1.34 x̃: 1
helped stats (rel) min: 20.00% max: 50.00% x̄: 26.35% x̃: 20.00%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 16.67% max: 16.67% x̄: 16.67% x̃: 16.67%
95% mean confidence interval for registers value: -1.54 -1.00
95% mean confidence interval for registers %-change: -29.41% -20.69%
Registers are helped.
total threads in shared programs: 2471 -> 2520 (1.98%)
threads in affected programs: 49 -> 98 (100.00%)
helped: 25
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.96 x̃: 2
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for threads value: 1.88 2.04
95% mean confidence interval for threads %-change: 100.00% 100.00%
Threads are [helped].
total spills in shared programs: 168 -> 168 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0
total fills in shared programs: 186 -> 186 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4999>
GLSL shares functionality with ARB_vertex_program but the GLSL
spec defines the gl_LightSource builtin with a member order that
is different from the packing expected in ARB_vertex_program.
This difference introduces a need for specialist lowering code
when handling builtin structs that is not required for normal
uniform structs due to member location mismatches.
Since gl_LightSource can't be redefined it shouldn't matter if
we add the members in the order listed in the spec, just so long
as we add them all. So here we rearrange the definition of the
glsl builtin to reflex our internal layout and that of
ARB_vertex_program. This required for the following patch.
CC: <stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5656>
nir_load_store_vectorize_test.ssbo_load_adjacent_32_32_64_64 expectations
need to be fixed accordingly.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5589>
The load_per_vertex_{input,output} intrinsics simply mean that they're
reading an arrayed input/output, which have one element per invocation.
Most accesses to those use gl_InvocationID as the subscript. However,
it's totally possible to read any element of the array. For example,
an evaluation shader might read gl_in[2].gl_Position, or a control
shader might read output[0].
For threads processing a single patch, an input/output load is
convergent if and only if both sources (the per-vertex-array subscript
and the offset) are convergent. For threads processing multiple
patches, we continued to mark them divergent.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5613>
Previously all nir_intrinsic_load_uniform that were used as sources were
considered to be dynamically_uniform but when offsets of load_uniform
are indirect it can not be determined.
This fixes artefacts in Google Maps 3D view in V3D.
Fixes: 886d46b089 ("nir: Add a function to determine if a source is dynamically uniform")
Reviewed-by: Neil Roberts <nroberts@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5587>
The GLSL to NIR compiler supports the LowerTessLevel flag to convert
gl_TessLevelInner/Outer from their GLSL declarations as arrays of
floats to vec4/vec2s to better match how they are represented in
hardware.
This commit adds the similar support to the SPIR-V to NIR compiler so
turnip can use the same IR3/NIR tess lowering passes as freedreno.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5059>
This commit adds a tess_levels_are_sysvals flag to
spirv_to_nir_options similar to GLSLTessLevelsAsInputs in the GLSL to
NIR compiler options. This will be used by turnip as the tess IR3
lowering pass (ir3_nir_lower_tess) operates on TessLevelInner and
TessLevelOuter in the DS as sysvals.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5059>
The scheduler has code to handle hardware that shares the same memory
for inputs and outputs. Seeing as the specific stages that need this is
probably hardware-dependent, this patch makes it a configurable option
instead of hard-coding it to everything but fragment shaders.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
nir_deps_state is a struct used as a closure for calculating the
dependencies. Previously it had two fields copied out of the scoreboard.
The closure is initialised in two seperate places. In order to make it
easier to access other members of the scoreboard in the callbacks in
later patches, the closure now just contains a pointer to the scoreboard
and the two bits of copied information are removed.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
load_per_vertex_input should probably be handled in the same way as a
regular load_input. I think the nir_schedule pass was written before V3D
had geometry shader support, so that is probably why it hasn’t taken
this into account until now.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
This case is never hit, we don't have a nir intrinsic for this spirv
opcode. And when we do, I'm not sure if it would be vectorized or not.
So best just to drop this case.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5505>
The fix in the previous patch removed an erronous attempt to skip
resizing variable types in each stage. Now that has been removed
iterating over each shader stage is no longer required here.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5487>
The initial support tried to match uniform variables from different
shaders based on the variables pointer. This will obviously never
work, instead here we use the variables name whcih also means we
must disable this optimisation for spirv.
Using the base variable name works because when collecting uniform
references we never iterate past the first array dimension, and
only support resizing 1D arrays (we also don't support resizing
arrays inside structs).
We also drop the resized bool as we can't skip processing the var
just because is was resized in another shader, we must resize
the var in all shaders.
Fixes: a34cc97ca3 ("glsl: when NIR linker enable use it to resize uniform arrays")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3130
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5487>
All the other driver-specific intrinsics are at the end of the file so
Intel's should go there too.
Reviewed-by: Sagar Ghuge<sagar.ghuge@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5503>
Validate that num_components is only set for vectorized instructions, to
prevent other nir passes or driver backends from mistakenly relying on
num_components for non-vectorized instructions.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5371>
Of the possible intrinsics generated, only load_ssbo is vectorized (and
store_ssbo is never generated)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5371>
It's tricky to merge XFB-outputs correctly, because we need there to not
be any overlaps when we get to `nir_gather_xfb_info_with_varyings` later
on. We currently trigger an assert there if we end up merging here.
So let's not even try. This is an optimization, and we can optimize this
in safe cases later if needed. For now, let's play it safe.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5329>