Save a comparison, and move out the comparison to be more backend friendly.
Saves 2 instrs on AGX (as the remaining comparison now fuses with bcsel).
Results on AGX, all affected shaders in asphalt9.
total instructions in shared programs: 1813003 -> 1812611 (-0.02%)
instructions in affected programs: 119646 -> 119254 (-0.33%)
helped: 333
HURT: 0
Instructions are helped.
total bytes in shared programs: 11870344 -> 11867208 (-0.03%)
bytes in affected programs: 820888 -> 817752 (-0.38%)
helped: 333
HURT: 0
Bytes are helped.
and on Mali-G57:
total instructions in shared programs: 2677538 -> 2677205 (-0.01%)
instructions in affected programs: 206923 -> 206590 (-0.16%)
helped: 333
HURT: 0
Instructions are helped.
total cvt in shared programs: 14667.50 -> 14662.30 (-0.04%)
cvt in affected programs: 1953.64 -> 1948.44 (-0.27%)
helped: 333
HURT: 0
Cvt are helped.
total quadwords in shared programs: 1450664 -> 1450544 (<.01%)
quadwords in affected programs: 5064 -> 4944 (-2.37%)
helped: 15
HURT: 0
Quadwords are helped.
total threads in shared programs: 53282 -> 53309 (0.05%)
threads in affected programs: 27 -> 54 (100.00%)
helped: 27
HURT: 0
Threads are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26489>
The NIR intrinsics now take and return a barrier whenever one is
modified instead of modifying in-place. In NAK, we give the internal
instructions the same treatment and convert everything to use barrier
SSA values and RegRefs. In nak_from_nir, we move all barriers to/from
GPRs. We'll clean up the massive pile of OpBMov later.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26463>
glsl_get_gl_type only accessed in src/compiler/glsl files, do not expose it
in libcompiler
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25879>
asahi will pass in 16bits, works fine if we convert before clamping. note we
don't try to be clever and make a smaller immediate because it would require
extra logic for negatives to make sure we don't have garbage in upper bits
(nir_validate checks that). do the simple, obviously correct thing.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26440>
Because we are now doing the lowering in NIR we need to move the code
that sets the compact flag on some builtin vars out of the glsl to nir
pass.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26001>
If the shader passed to nir_lower_vars_to_scratch contains some unused
derefs to a variable that will be lowered, validation will fail because
the variable is not part of the shader after the pass.
cc: mesa-stable
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26271>
This flag indicates the requirement of helper invocations
in fragment shaders, independent from any present instructions.
This fixes the lowering of OpGroupNonUniformQuad* instructions.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26026>
This is based off the original GLSL IR pass but it is much much
simpler as it doesn't need to do all of the hackery required in
GLSL IR to achieve the lowering.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25860>
Since v71, broadcom hw include specific packing/conversion
instructions, so this commit adds opcodes to be able to make use of
them, specially for image stores:
* pack_2x16_to_unorm_2x8 (on backend vftounorm8/vftosnorm8):
2x16-bit floating point to 2x8-bit unorm/snorm
* f2unorm_16/f2snorm_16 (on backend ftounorm16/ftosnorm16):
floating point to 16-bit unorm/snorm
* pack_2x16_to_unorm_2x10/pack_2x16_to_unorm_10_2 (on backend
vftounorm10lo/vftounorm10hi): used to convert a floating point to
a r10g10b10a2 unorm
* pack_32_to_r11g11b10 (on backend v11fpack): packs 2 2x16 FP into
R11G11B10.
* pack_uint_32_to_r10g10b10a2 (on backend v10pack): pack 2 2x16
integer into R10G10B10A2
* pack_4x16_to_4x8 (on backend v8pack): packs 2 2x16 bit integer
into 4x8 bits.
* pack_2x32_to_2x16 (on backend vpack): 2x32 bit to 2x16 integer
pack
For the latter, it can be easly confused with the existing
pack_32_2x16_split. But note that this one receives two 16bit integer,
and packs them on a 32bit integer. But broadcom opcode takes two 32bit
integer, takes the lower halfword, and packs them as 2x16 on a 32bit
integer.
Interestingly broadcom also defines a similar one that packs the
higher halfword. Not used yet.
Note that at this point we use agnostic names, even if we add a _v3d
suffix as they are only available for broadcom, in order to follow
current NIR conventions.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25726>
Nothing should currently hit this path.
The next commit adds code to nir_pack_bits and nir_unpack_bits that can
lead to this path being hit.
v2: Change nir_u2uN(..., 8) to nir_u2u8(...). Suggested by Alyssa.
v3: Don't generate nir_extract_u8 if the driver has set
lower_extract_byte. These instructions were causing some problems for
dozen.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> [v2]
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v2]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24741>
It should not be possible for this to happen now as the nir_pack_32_4x8
instruction that is being lowered shouldn't exist. A later commit will
change this.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24741>
Yet another bit of branchiness we should tame. 99% of the time, sources are not
for if's, so we shouldn't need to do the extra checking to handle that 1%.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
We don't check the sizes for ALU srcs, which is the hot path here, so split out
that simplified version for ALU instructions to use, while deriving a sized
version for other kinds of instructions.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
There's no more nir_register.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
We have dominance validation elsewhere in the file.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
Instead, check it at the call sites when actually required (basically just
intrinsics), reducing the branching required when not (ALU validation, the
hottest of hot paths for CI).
IMHO this is more obvious too.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
No apparent performance difference, but documents the intention.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
Nothing should ever be reading them, they logically do not exist. So there's no
point validating them, especially when the validation in question is so useless
(just checking the bit width, without any semantic awareness). Yet now that we
support vec16, this loop is quite hot even on scalar ISAs, and rather
pointlessly so. Just remove it and bring the ALU src validation complexity to
O(# of channels in source) instead of O(max # of channels in NIR).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
It doesn't inline and so is about 1% of M1 CTS time. Expand out the definition
and simplify the logic. Honestly, I think this is clearer too.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
Profiling showed that maintaining this ssa_srcs set consumes ~3% of CTS time
with a debugoptimized build. Unfortunately, we really do benefit from getting
this coverage in CI. So rather than remove the validation, let's optimize the
data structure used so we can keep the coverage at a fraction of the cost.
The expensive piece is the pointer set, which is backed by a relatively
expensive hash table. It would be much cheaper to use an invasive set instead,
with a single "present" bit. We don't want to bloat nir_src for this, however
there's an easy solution: use a tagged pointer to steal a bit in the nir_src for
the job. We untag everything at the end of validation (and this meta-invariant
is asserted with an auxiliary counter), so while we mutate the IR while
validating, the mutations do not escape nir_validate.
We tag the parent pointer and not the def pointer, because it is dramatically
less used and therefore has far fewer disrupted call sites.
The M1 job is improved from 3:03 to 2:55 of deqp-runner reported time, which is
excellent.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
Deduplicates the "get # of channels" logic which was the same between the
helpers.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>