Commit graph

5626 commits

Author SHA1 Message Date
Pavel Ondračka
33c8dc4f18 nir/nir_group_loads: reduce chance of max_distance check overflow
Helps for the case when max_distance is set to ~0, where the pass would now
only create groups of two loads together due to overflow. Found while
experimenting with this pass on r300, however the only driver currently
affected is i915.

With i915 this change gains around 20 shaders in my small shader-db
(most notably some GLMark2, Unigine Tropics, Tesseract, Amnesia) at
the expense of increased register pressure in few other cases.
I'm assuming this is a good deal for such old HW, and this seems like what
was intended when the pass was introduced to i915, but anyway this
could be tweaked further driver side with a more optimized max_distance
value. Only shader-db tested.

Relevant i915 shader-db stats (lpt):
total tex_indirect in shared programs: 1529 -> 1493 (-2.35%)
tex_indirect in affected programs: 96 -> 60 (-37.50%)
helped: 29
HURT: 2
total temps in shared programs: 3015 -> 3200 (6.14%)
temps in affected programs: 465 -> 650 (39.78%)
helped: 1
HURT: 91

GAINED: 20

Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: GKraats <vd.kraats@hccnet.nl>
Fixes: 33b4eb149e
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31529>
2024-10-18 09:21:22 +00:00
Job Noorman
509606e56d nir/lower_subgroups: scan/reduce for multiple ballot components
lower_scan_reduce only worked when ballot_components equals one. This
commit adds support for arbitrary ballot_components.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31587>
2024-10-18 06:57:52 +00:00
Job Noorman
58b199f7ed nir/lower_subgroups: add build_cluster_mask helper
This functionality will become more complex in the next commit so
separate it into a helper function.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31587>
2024-10-18 06:57:52 +00:00
Job Noorman
e0cb4a94a3 nir/lower_subgroups: move up some helper functions
build_subgroup_mask and build_ballot_imm_ishl will be needed by other
functions higher-up the file.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31587>
2024-10-18 06:57:52 +00:00
Lionel Landwerlin
97b17aa0b1 brw/nir: rework inline_data_intel to work with compute
This intrinsic was initially dedicated to mesh/task shaders, but the
mechanism it exposes also exists in the compute shaders on Gfx12.5+.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>
2024-10-17 19:35:59 +00:00
Georg Lehmann
dbf63a0788 nir: remove nir_op_is_derivative
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>
2024-10-17 09:50:19 +00:00
Georg Lehmann
f9d2aad7a3 nir: remove alu ddx/ddy
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>
2024-10-17 09:50:19 +00:00
Georg Lehmann
bf0d1a42b4 nir: remove uses_fddx_fddy
Unused and the code didn't even do what the comment said.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>
2024-10-17 09:50:19 +00:00
Georg Lehmann
cba575f4df nir: always emit ddx intrinsics
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>
2024-10-17 09:50:19 +00:00
Georg Lehmann
1371a8fe2b nir/opt_move_discards_to_top: handle ddx/ddy intrinsics
Fixes: daa97bb41a ("amd: switch to derivative intrinsics")

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>
2024-10-17 09:50:19 +00:00
Marek Olšák
948f94b8c5 nir/opt_varyings: pack TCS inputs with cross-invocation access together
Unigine Heaven has a TCS that reads pos.xyz and tescoord.w from all
invocations in every invocation. By putting those two in the same vec4,
AMD hw can reduce the amount of shared memory that is allocated for those
inputs from 2 vec4s to 1 vec4.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31670>
2024-10-17 03:30:07 +00:00
Marek Olšák
8e93907b7c nir/opt_varyings: assign locations of no_varying IO for TCS outputs only
Skip the code for other shader stages because it doesn't do anything there.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31670>
2024-10-17 03:30:07 +00:00
Connor Abbott
65c0846537 nir/lower_input_attachments: Handle unscaled input attachments with no index
With VK_KHR_dynamic_rendering_local_read we can have input attachments
with no index, which normally correspond to depth/stencil attachments,
and we have to handle this here when determining whether we need to emit
an unscaled fragcoord for FDM.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31261>
2024-10-17 00:30:44 +00:00
Connor Abbott
4bd506a7f3 spirv: Make the default input attachment index ~0
This will let us know when an input attachment doesn't have an
InputAttachmentIndex, which used to be illegal but is now allowed and
meaningful with VK_KHR_dynamic_rendering_local_read.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31261>
2024-10-17 00:30:44 +00:00
Job Noorman
4556b18f51 nir: add shuffle_{xor,up,down}_uniform_ir3 intrinsics
These are like shuffle_{xor,up,down} except they expect a dynamically
uniform index. This is necessary since the ir3 shfl instruction does not
work with a divergent index.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31501>
2024-10-16 22:05:10 +00:00
Danylo Piliaiev
7b09fc98fb nir/opt_16b_tex_image: Sign extension should matter for texel buffer txf
Texel buffer could be arbitrary large, so the assumption being made in
the following comment is wrong:

 "Zero-extension (u16) and sign-extension (i16) have
  the same behavior here - txf returns 0 if bit 15 is set
  because it's out of bounds and the higher bits don't matter."

Sign extension should matter for GLSL_SAMPLER_DIM_BUF.

This fixes the case of doing texelFetch with u16 offset:

  uniform itextureBuffer s1;
  uint16_t offset = some_ssbo.offset;
  value = texelFetch(s1, offset).x;

If the offset is higher than s16 optimization incorrectly
left it as 16b.

In spirv the above glsl is translated into:

  %22 = OpLoad %ushort %21
  %23 = OpUConvert %uint %22
  %24 = OpBitcast %int %23
  %26 = OpImageFetch %v4int %16 %24

Cc: mesa-stable

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31664>
2024-10-16 10:10:00 +00:00
Timothy Arceri
aa7c59e02c nir/glsl: set deref cast mode for blocks during function inlining
More cast fixes this time for UBO and SSBO. Which were missing testing
previously.

Fixes: d681cf96fb ("nir/glsl: set deref cast mode during function inlining")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11587

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31668>
2024-10-16 06:25:57 +00:00
Marek Olšák
0727634443 nir/opt_load_store_vectorize: vectorize load_smem_amd
radeonsi+ACO with the new vectorization callback:

TOTALS FROM AFFECTED SHADERS (19508/58918)
  VGPRs: 708672 -> 708864 (0.03 %)
  Code Size: 31458688 -> 31217160 (-0.77 %) bytes
  Max Waves: 305960 -> 305952 (-0.00 %)

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
2024-10-15 05:50:24 +00:00
Marek Olšák
a44e5cfccf nir/opt_load_store_vectorize: allow a 4-byte hole between 2 loads
If there is a 4-byte hole between 2 loads, drivers can now optionally
vectorize the loads by including the hole between them, e.g.:
    4B load + 4B hole + 8B load --> 16B load

All vectorize callbacks already reject all holes, but AMD will want to
allow it.

radeonsi+ACO with the new vectorization callback:

TOTALS FROM AFFECTED SHADERS (25248/58918)
  VGPRs: 871116 -> 871872 (0.09 %)
  Spilled SGPRs: 397 -> 407 (2.52 %)
  Code Size: 43074536 -> 42496352 (-1.34 %) bytes

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
2024-10-15 05:50:24 +00:00
Marek Olšák
80c156422d nir/opt_load_store_vectorize: allow overfetching, merge overfetched loads
New load merging transformations (first, second), examples:
    (vec4, vec3) ==> vec8(read=0x7f) (because NIR doesn't have vec7)
    (vec1, vec8(read=0x7f)) ==> vec8(read=0xff)
    - the unused component at the end of vec8 is dropped

Not merged:
    vec8(read=0xfe) + vec1
    - unused components at the beginning are kept

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
2024-10-15 05:50:24 +00:00
Marek Olšák
65ace5649b nir: reject unsupported component counts from all vectorize callbacks
If you allow an unsupported component count in the callback for loads,
nir_opt_load_store_vectorize will align num_components to the next supported
vector size, essentially overfetching.

This changes all callbacks to reject it. AMD will enable it in a later commit.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
2024-10-15 05:50:24 +00:00
Marek Olšák
02923e237d nir: add hole_size parameter into the vectorize callback
It will be used to allow merging loads with a hole between them.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
2024-10-15 05:50:24 +00:00
Marek Olšák
8ce43b7765 nir/opt_load_store_vectorize: add entry::num_components
We will represent vec6..vec7, vec9..vec15 loads with 8 and 16
components respectively, so we need to track how many components
we really use.

This is a prerequisite for optimal merging up to vec16. Example:
    Step 1: vec4 + vec3 ==> vec7as8 (last component unused)
    Step 2: vec1 + vec7as8 ==> vec8 (last unused component dropped)

Without using the number of components read, the same example would end up
doing:
    Step 1: vec4 + vec3 ==> vec8
    Step 2: vec1 + vec8 ==> vec9 (fail)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
2024-10-15 05:50:24 +00:00
Alyssa Rosenzweig
e9303c0952 nir: extract round component helper
another nir pass will use this.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
2024-10-15 05:50:24 +00:00
Marek Olšák
64c4d29e65 nir/opt_vectorize_io: fix stack buffer overflow with 16-bit output stores
uncovered by unrelated work

Fixes: 2514999c9c - nir: add nir_opt_vectorize_io, vectorizing lowered IO

Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31644>
2024-10-15 03:59:17 +00:00
Timothy Arceri
46facf9037 nir/glsl: set cast mode for image during function inlining
Fixes: d681cf96fb ("nir/glsl: set deref cast mode during function inlining")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11980

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31554>
2024-10-15 03:13:24 +00:00
Konstantin Seurer
70a1453537 nir/print: Fix the alignment of 8-bit definitions
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31612>
2024-10-14 21:21:04 +00:00
Rhys Perry
67ad7359ff nir/divergence_analysis: disable phi undef optimization by default
If the backend does not implement this too, or some other future transform
modifiess the phi so that this isn't the case (replace the phi with a
bcsel or replace undef with zero), then it will not actually be uniform.

This keeps it enabled to some degree for RADV/ACO.

fossil-db (navi31):
Totals from 76 (0.10% of 79395) affected shaders:
Instrs: 195008 -> 195282 (+0.14%)
CodeSize: 1012592 -> 1015884 (+0.33%)
Latency: 3892826 -> 3898843 (+0.15%); split: -0.00%, +0.15%
InvThroughput: 460681 -> 460964 (+0.06%)
Copies: 13508 -> 13516 (+0.06%)
Branches: 5244 -> 5412 (+3.20%)
PreVGPRs: 5092 -> 5096 (+0.08%)
VALU: 116177 -> 116197 (+0.02%)
SALU: 23449 -> 23785 (+1.43%)

fossil-db (navi21):
Totals from 76 (0.10% of 79395) affected shaders:
Instrs: 164471 -> 164981 (+0.31%)
CodeSize: 883988 -> 888420 (+0.50%)
Latency: 4074287 -> 4082043 (+0.19%)
InvThroughput: 783783 -> 784276 (+0.06%); split: -0.00%, +0.06%
Branches: 5262 -> 5430 (+3.19%)
PreVGPRs: 5100 -> 5104 (+0.08%)
VALU: 116375 -> 116381 (+0.01%)
SALU: 23589 -> 23925 (+1.42%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30211>
2024-10-10 14:59:26 +00:00
Alyssa Rosenzweig
6287c8251d nir: add bounds_agx opcode
used to facilitate bounds checking optimization

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31532>
2024-10-05 18:30:11 +00:00
Iago Toral Quiroga
aac1c074cc nir: make fclamp_pos_mali and fsat_signed_mali opcodes generic
V3D can use these too.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31480>
2024-10-03 09:02:07 +00:00
Faith Ekstrand
62a4fe861a nir: Add an option to lower quad vote
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31470>
2024-10-02 21:10:32 +00:00
Marek Olšák
f546df95a6 nir/opt_vectorize_io: fix skipped output vectorization if inputs were vectorized
Fixes: 2514999c9c - nir: add nir_opt_vectorize_io, vectorizing lowered IO

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31454>
2024-10-02 20:26:23 +00:00
Job Noorman
4d50504b26 nir/lower_int64: add nir_intrinsic_rotate
Can simply be split into 32b ops.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31455>
2024-10-02 06:35:49 +00:00
Job Noorman
e6a5c342da nir/lower_int64: add nir_intrinsic_read_invocation_cond_ir3
Can simply be split into 32b ops.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31455>
2024-10-02 06:35:49 +00:00
Job Noorman
584b63ecab nir/load_store_vectorize: fix division by zero
Don't use glsl_get_explicit_stride as it may return 0 for vector types,
use nir_deref_instr_array_stride instead.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31460>
2024-10-02 05:53:57 +00:00
Rhys Perry
be64454710 nir/tests: test opt_loop_peel_initial_break with derefs in header block
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31324>
2024-10-01 12:24:22 +00:00
Rhys Perry
0484044b1a nir/opt_loop: rematerialize header block derefs in their use blocks
Otherwise, we could end up with phis of derefs.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 6b4b044739 ("nir/opt_loop: add loop peeling optimization")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31324>
2024-10-01 12:24:22 +00:00
Gert Wollny
f19f1ec17b nir/opt_algebraic: Allow two-step lowering of ftrunc@64 to use ffract@64
If ftrunc@64 is lowered by nir_lower_doubles it is turned into a
comparable long series of 32 bit operations. If the hardware
supports ffract@64 then nir_opt_algebraic can first lower ftrunc@64
to use some combinations with ffloor@64. They can then be turned
into a combination of fsub@64 and ffract@64 resulting in less
all-over instructions.

Fixes: 5218cff34b
   nir/algebraic: avoid double lowering of some fp64 operations

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29281>
2024-09-30 23:51:02 +00:00
Kenneth Graunke
0b34a7aff0 nir: Don't generate single iteration loops to zero-initialize memory
If the stride we're adding to our loop counter is larger than the total
amount of shared local memory we're trying to initialize, we know the
loop will run at most one time.  So we can skip emitting a loop.

Loop unrolling appears to be unable to detect this currently.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31312>
2024-09-30 05:27:17 +00:00
Georg Lehmann
bb7e8d51b6 nir: delete nir_opt_reuse_constants
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31031>
2024-09-27 05:19:16 +00:00
Georg Lehmann
60776f87c3 nir/opt_remove_phis: rematerialize constants
Foz-DB Navi31:
Totals from 749 (0.94% of 79395) affected shaders:
Instrs: 1224359 -> 1223722 (-0.05%); split: -0.07%, +0.02%
CodeSize: 6468392 -> 6466296 (-0.03%); split: -0.06%, +0.03%
Latency: 9764410 -> 9766457 (+0.02%); split: -0.01%, +0.03%
InvThroughput: 1017401 -> 1017380 (-0.00%); split: -0.03%, +0.03%
VClause: 19902 -> 19873 (-0.15%); split: -0.16%, +0.02%
SClause: 38441 -> 38424 (-0.04%); split: -0.05%, +0.01%
Copies: 86880 -> 86304 (-0.66%); split: -0.73%, +0.06%
Branches: 34206 -> 34159 (-0.14%); split: -0.14%, +0.01%
PreSGPRs: 45557 -> 45527 (-0.07%); split: -0.08%, +0.01%
PreVGPRs: 32406 -> 32408 (+0.01%)
VALU: 671633 -> 671533 (-0.01%); split: -0.02%, +0.01%
SALU: 155284 -> 154675 (-0.39%); split: -0.40%, +0.00%
VMEM: 27303 -> 27271 (-0.12%)
SMEM: 67490 -> 67455 (-0.05%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31031>
2024-09-27 05:19:16 +00:00
Georg Lehmann
40fc85c15b nir: make nir_instr_clone usable with load_const and undef
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31031>
2024-09-27 05:19:16 +00:00
Georg Lehmann
a9f8089240 nir: replace nir_opt_remove_phis_block with a single source version
This is what callers actually want, and it simplifies nir_opt_remove_phis
because we can assume dominance meta data is valid.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31031>
2024-09-27 05:19:16 +00:00
Georg Lehmann
41e82b8b8e nir: sink is_subgroup_invocation_lt_amd
Having it closer to the branches means we can eliminate an exec copy.

Foz-DB Navi31:
Totals from 11615 (14.63% of 79395) affected shaders:
Instrs: 6804372 -> 6804903 (+0.01%); split: -0.04%, +0.05%
CodeSize: 33684672 -> 33680584 (-0.01%); split: -0.07%, +0.05%
VGPRs: 578616 -> 578604 (-0.00%)
SpillSGPRs: 1506 -> 1304 (-13.41%)
Latency: 29817034 -> 29821320 (+0.01%); split: -0.03%, +0.05%
InvThroughput: 3581587 -> 3581217 (-0.01%); split: -0.02%, +0.01%
VClause: 124826 -> 124782 (-0.04%); split: -0.04%, +0.00%
SClause: 187916 -> 187645 (-0.14%); split: -0.27%, +0.13%
Copies: 520969 -> 510027 (-2.10%); split: -2.20%, +0.10%
PreSGPRs: 442584 -> 421344 (-4.80%)
VALU: 3810755 -> 3810267 (-0.01%); split: -0.01%, +0.00%
SALU: 763402 -> 752650 (-1.41%); split: -1.48%, +0.07%

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31184>
2024-09-26 14:29:14 +00:00
Georg Lehmann
bcfc5c09fa amd: add offset to is_subgroup_invocation_lt_amd
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31184>
2024-09-26 14:29:13 +00:00
Marek Olšák
09e64e3682 nir/opt_shrink_vectors: shrink memory loads, not just IO
The problem with radeonsi+ACO is that UBO loads from vec4 uniforms using
only 1 component always load all 4 components. This fixes that.

We are only interested in shrinking UBO and SSBO loads, but I added more
intrinsics because why not.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29384>
2024-09-26 03:01:38 +00:00
Timothy Arceri
f6e7520b13 glsl: remove now unused linker code
This has all be replaced by a nir based linker implementation.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137>
2024-09-25 09:39:44 +00:00
Timothy Arceri
fe9b93fc1c nir: handle wildcard array deref
Here we add handling of wildcard array derefs when attempting to mark
an io as partially used rather than hitting an assert.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137>
2024-09-25 09:39:44 +00:00
Timothy Arceri
6bb6b0e5ad nir: add nir_intrinsic_deref_implicit_array_length intrinsic
This will be used to handle .length() calls on unsized arrays

Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137>
2024-09-25 09:39:44 +00:00
Timothy Arceri
60937b5286 nir: add implicit_conversion_prohibited field to nir_parameter
Will be used in link time validation in following patches.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137>
2024-09-25 09:39:44 +00:00