This lowering is smaller and -INT64_MIN is probably UB (signed overflow).
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
ineg(INT_MIN)/iabs(INT_MIN) won't work as expected.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
If "a" is a multiple of "b", then the result would have been "b" instead
of 0.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 0ef5f3552f ("nir: add strength reduction pattern for imod/irem with pow2 divisor.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
This adds support for SpvOpAtomicFlag operations.
This is just a simple implementation that lowers
Clear to Store 0
and
TestAndSet to Cas (0, -1)
There are likely platforms/hw that will want to
lower this all the way through NIR and into their
backend, but this will do for now.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12229>
The previous handling conflated RelPatchID and PrimID, which would
result in incorrect gl_PrimitiveID when doing draw splitting and didn't
work with PrimID passthrough which fills the VPC slot with the "correct"
PrimID value from the tess factor BO which we left 0. Replace PrimID in
the tess lowering pass with a new RelPatchID sysval, and relace PrimID
with RelPatchID in the VS input code in turnip/freedreno at the same
time so that there is no net change in the tess lowering code. However,
now we have to add new mechanisms for getting the user-level PrimID:
- In the TCS it comes from the VS, just like gl_PrimitiveIDIn in the GS.
This means we have to add another register to our VS->TCS ABI. I
decided to put PrimID in r0.z, after the TCS header and RelPatchID,
because it might not be read in the TCS.
- If any stage after the TCS uses PrimID, the TCS stores it in the first
dword of the tess factor BO, and it is read by the fixed-function
tessellator and accessed in the TES via the newly-uncovered DSPRIMID
field. If we have tess and GS, the TES passes this value through to
the GS in the same way as the VS does. PrimID passthrough for reading
it in the FS when there's tess but no GS also "just works" once we
start storing it in the TCS. In particular this fixes
dEQP-VK.pipeline.misc.primitive_id_from_tess which tests exactly that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12166>
The liveness information will be a superset of real liveness so it's
unlikely something will explode if it tries to use it. However, it is
out-of-date and should be re-run if someone really wants it.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12186>
Many places need to know the maximum or minimum possible value for a
given size integer... so everyone just open-codes their favorite
version. There is some potential to hit either undefined or
implementation-defined behavior, so having one version that Just Works
seems beneficial.
v2: Fix copy-and-pasted bug (INT64_MAX instead of INT64_MIN) in
u_intmin. Noticed by CI. Lol. Rename functions
`s/u_(uint|int)(min|max)/u_\1N_\2/g`. Suggested by Jason. Add some
unit tests that would have caught the copy-and-paste bug before wasting
CI time. Change the implementation of u_intN_min to use the same
pattern as stdint.h. This avoids the integer division. Noticed by
Jason.
v3: Add changes to convert_clear_color
(src/gallium/drivers/iris/iris_clear.c). Suggested by Nanley.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12177>
This is where it should be rather than having to pass it into the
optimisation pass every time.
It also allows us to call the loop analysis pass without having to
duplicate these options which we will do later in this series.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>
A future commit will replace st_vertex_program::input_to_index with
a prefix bitcount of inputs_read, but it needs vertex inputs to be
in the same order as vertex attribs.
Some of the FF definitions don't make sense with this ordering and are
removed.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11370>
Instead of v_bfe + v_lshl_or for each vertex, get all 3 edge flags
at once of every vertex. This takes fewer VALU instructions than
previously.
Fossil DB results on Sienna Cichlid (with NGGC on):
Totals from 56917 (44.24% of 128647) affected shaders:
CodeSize: 161028288 -> 158751628 (-1.41%)
Instrs: 30917985 -> 30519571 (-1.29%)
Latency: 130617204 -> 129975532 (-0.49%); split: -0.50%, +0.01%
InvThroughput: 21280238 -> 20927401 (-1.66%)
Copies: 3011120 -> 3011125 (+0.00%); split: -0.00%, +0.00%
No Fossil DB changed with NGGC off.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11908>
For TGSI, we need the coordinate, comparator, bias, and LOD all together
in the first two vec4 args, and by doing it in the backend we were
generating extra MOVs.
softpipe shader-db results:
total instructions in shared programs: 2985416 -> 2953625 (-1.06%)
instructions in affected programs: 499937 -> 468146 (-6.36%)
total temps in shared programs: 544769 -> 565869 (3.87%)
temps in affected programs: 105469 -> 126569 (20.01%)
i915g shader-db:
total instructions in shared programs: 371625 -> 369594 (-0.55%)
instructions in affected programs: 24903 -> 22872 (-8.16%)
total tex_indirect in shared programs: 11381 -> 11365 (-0.14%)
tex_indirect in affected programs: 43 -> 27 (-37.21%)
LOST: 7
GAINED: 16
The temps increase is the pre-existing issue that we never release temps
for NIR regs, which doesn't matter much for softpipe (just memory/cache
footprint) but does for i915g as seen by shaders that no longer compile
(though overall we seem to win).
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11912>
This allows the variables decorated with RelaxedPrecision to have the
proper precision. It is worth to note that the decorator can be
applied on other cases, but those would be handled on the future.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7614>
There are two distinct cases:
- The last member of a shader storage block (length determined at run-time)
- Implicitly-sized array (length determined at link-time)
Fixes: 273f61a005 ("glsl: Add parser/compiler support for unsized array's length()")
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11952>
ARB_shader_storage_buffer_object extension (promoted to core in 4.3) allows us
to call .length() method on arrays declared without an explicit size. The length is
determined at link time as a maximum array access.
Fixes: 273f61a005 ("glsl: Add parser/compiler support for unsized array's length()")
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11952>
This patch allows to shrink vecN instructions where
one or more components at any position are unused.
Stat changes for softpipe:
total instructions in shared programs: 2986101 -> 2985416 (-0.02%)
instructions in affected programs: 51216 -> 50531 (-1.34%)
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11411>
ALU instructions of which not all components are read,
can be shrunk to the number of read components.
Previously, this would only remove trailing components.
This patch enables to remove components from any position.
Stat changes for softpipe:
total instructions in shared programs: 3001291 -> 2984698 (-0.55%)
instructions in affected programs: 225585 -> 208992 (-7.36%)
total loops in shared programs: 1389 -> 1358 (-2.23%)
loops in affected programs: 36 -> 5 (-86.11%)
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11411>
Only fragment and some compute shaders support implicit derivatives.
They're totally meaningless without helper invocations and some
understanding of the dispatch pattern. We've got code to lower
nir_texop_tex in these shader stages to use an explicit derivative of 0
but it was pretty badly broken:
1. It only handled nir_texop_tex, not nir_texop_txb or nir_texop_lod.
2. It didn't take min_lod into account
3. It was conflated with adding a missing LOD parameter to opcodes
which expect one such as nir_texop_txf. While not really a bug,
this does make it way harder to reason about the code.
4. Unless you set a flag (which most drivers don't), it left the
opcode nir_texop_tex instead of nir_texop_txl which it should have
been.
This reworks it to go through roughly the same path as other LOD
lowering only with a constant lod of 0 instead of calling out to
nir_texop_lod. We also get rid of the lower_tex_without_implicit_lod
flag because most drivers set it and those that don't are probably
subtly broken. If someone really wants to get nir_texop_tex in their
vertex shaders, they can write a new patch to add the flag back in.
Fixes: e382890e25 "nir: set default lod to texture opcodes that..."
Fixes: d5ac5d6e83 "nir: Add option to lower tex to txl when..."
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>
It's intel-specific, used to get at MSAA compression information.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>
This expands on commit c54c42321e. See the code comment for full
justifications. At the time of the previous commit Ian wanted to
limit the relaxing of the rule to GLSL 3.30 as that was the highest
version of shaders seen in the wild that were having trouble with
the stricter rules.
However since then I've found that the long standing issue with tess
shaders failing to compile in the game 'Layers Of Fear' is due to
this same issue. The game uses 4.10 shaders and also makes use of
explicit varying locations, so here we relax the rule to 4.20 and
make sure to apply the restriction to shaders using varyings with
explicit locations also.
Fixes: c54c42321e ("glsl: relax rule on varying matching for shaders older than 4.00")
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11873>
This is required for Zink where the API ballot type is a uint64_t and
the "hardware" ballot type is uvec4.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11989>
This changes the pass to extract pinned instructions and not just unpinned
instructions when rescheduling instructions. This stops pinned instructions
from being bunched together when instructions are reinserted into the blocks
which can result in regressions with regards to cycles and instruction
counts on i965 and register use/Max Waves on AMD hardware.
In order to do this we also throw away the post-order depth-first
search linearization algorithm used to re-insert the instructions, which
itself causes possible regressions when instructions are reinserted into
a less than ideal new order (of which the bunched together pinned
instructions is one example). Instead we simply insert instructions in the
reverse order they were extracted. This will simply place instructions
that were scheduled earlier onto the end of their new block and
instructions that were scheduled later to the start of their new block.
With this everything should remain in order without the need to run
over uses.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/597>
With this pass enabled in Intel drivers, running shader-db on
shaders/unity/38.shader_test resulted in
Program received signal SIGSEGV, Segmentation fault.
gcm_schedule_early_src (src=0x555555d45348, void_state=0x7fffffffba40) at ../../SOURCE/master/src/compiler/nir/nir_opt_gcm.c:297
297 if (info->early_block->index < src_info->early_block->index)
(gdb) print src_info->early_block
$1 = (nir_block *) 0x0
I tracked this down to an early exit from gcm_schedule_early_instr on
the parent instruction because instr->pass_flags was 0x1c. That
should be an impossible value for this pass, so I inferred that
pass_flags must have dirt left from some previous pass.
Fixes: 8dfe6f672f ("nir/GCM: Use pass_flags instead of bitsets for tracking visited/pinned")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/597>
The rules here are the same as for texture instructions. The bits on
the intrinsic are the ground truth and are allowed to vary from the
deref a bit as-needed. If the intrinsic says PIPE_FORMAT_NONE, then we
can look at the variable, if visible, to get format information. This
means that we need to be careful when we rewrite intrinsics based on the
deref to only override the format from the _deref intrinsic from the
image variable unless the intrinsic is PIPE_FORMAT_NONE.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11849>
Semantically, -1 means "Unknown; don't validate" but it's really only
used for derefs because they often need to be flexible. We don't really
need that flexibility for image intrinsics but this makes it more
consistent. More immediately useful is that this gives us the ability
to tell _deref forms of these intrinsics apart from the lowered ones.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11849>