Commit graph

392 commits

Author SHA1 Message Date
Kenneth Graunke
f88eb48ff2 anv: Don't consider nir_var_mem_global for vectorizer robustness checks
nir_opt_load_store_vectorize checks for potential address wrapping
when vectorizing two loads ("low" and "high").  It looks for cases where
"low" might have a large address, and "high" has a positive offset
which, when added together, could trigger integer wraparound.  The issue
here is that if the large address of "low" was considered out-of-bounds,
adding offset could wrap around to a small address, which might actually
be in-bounds.  Thus, when loaded separately, "low" will fail and trigger
robustness out-of-bound-read behavior, but "high" would read correctly.
When vectorized, the entire load would fail.  This is explicitly tested
for with 32-bit SSBO addresses in the Vulkan CTS.

However, anv's 64-bit global addresses and VMA handling effectively
prevent this case.  Addresses 0-4095 are a reserved page so that if
people try to use 0 as a NULL pointer, it never maps to a valid BO.
That alone guarantees that the above case where "high" gets a small
address would never be in-bounds, so we don't need to check for it.

In fact, we allocate most user allocations out of high addresses,
and have specialized allocation heaps for certain types of GPU data
structures in the lower GB of memory.  For a load to wrap around and
successfully land in the right heap, it would have to load gigabytes.

Disabling this allows load vectorization and overfetching in more cases.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
2024-12-03 02:02:33 +00:00
Kenneth Graunke
da93b13f8b brw: Use nir_combined_align in brw_nir_should_vectorize_mem
Better than open-coding this.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
2024-12-03 02:02:32 +00:00
Lionel Landwerlin
bfcb9bf276 brw: rename brw_sometimes to intel_sometimes
Moving it to intel_shader_enums.h

The plan is to make it visible to OpenCL shaders.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32329>
2024-11-26 13:05:30 +00:00
Marek Olšák
25d4943481 nir: make use_interpolated_input_intrinsics a nir_lower_io parameter
This will need to be set to true when the GLSL linker lowers IO, which
can later be unlowered by st/mesa, and then drivers can lower it again
without load_interpolated_input. Therefore, it can't be a global
immutable option.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32229>
2024-11-20 02:45:37 +00:00
Rhys Perry
45c1280d2c nir_lower_mem_access_bit_sizes: pass access to callback
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>
2024-11-13 12:59:26 +00:00
Rhys Perry
61752152f7 nir_lower_mem_access_bit_sizes: add nir_mem_access_shift_method
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>
2024-11-13 12:59:26 +00:00
Daniel Schürmann
87cb42f953 treewide: don't lower to LCSSA before calling nir_divergence_analysis()
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>
2024-10-24 10:06:17 +00:00
Kenneth Graunke
834b919f6a brw: Optimize 16-bit texture fetches later
At the point we were calling this, we hadn't necessarily cleaned up
derefs via nir_lower_vars_to_ssa, nor movs/vecs via copy propagation,
so it wasn't necessarily easy for this pass to see the actual usage of
the destination.

Moving this later allows us to detect f2f32(txf(...)) and avoid
converting it to a 16-bit txf (why convert with ALU instructions
when the sampler could do it for us?).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31750>
2024-10-22 01:15:10 +00:00
Lionel Landwerlin
97b17aa0b1 brw/nir: rework inline_data_intel to work with compute
This intrinsic was initially dedicated to mesh/task shaders, but the
mechanism it exposes also exists in the compute shaders on Gfx12.5+.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>
2024-10-17 19:35:59 +00:00
Lionel Landwerlin
f5d123b977 brw: delay printf lowering
Useful to insert debug traces a bit later in the lowering process (in
particular after load/store vectorization).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>
2024-10-17 19:35:59 +00:00
Marek Olšák
65ace5649b nir: reject unsupported component counts from all vectorize callbacks
If you allow an unsupported component count in the callback for loads,
nir_opt_load_store_vectorize will align num_components to the next supported
vector size, essentially overfetching.

This changes all callbacks to reject it. AMD will enable it in a later commit.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
2024-10-15 05:50:24 +00:00
Marek Olšák
02923e237d nir: add hole_size parameter into the vectorize callback
It will be used to allow merging loads with a hole between them.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
2024-10-15 05:50:24 +00:00
Lionel Landwerlin
02b124846f brw: fix TGM messages to use cmask lsc opcodes
This is a restriction for TGM.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b55f7716 ("intel/brw: Switch to emitting MEMORY_*_LOGICAL opcodes")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31199>
2024-09-17 09:28:58 +00:00
Lionel Landwerlin
2159e17da0 brw: remove (load|store)_raw_intel
Those are Elk specific intrinsics.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b8f264cfe4 ("intel/brw: Handle load/stores in lsc_op_for_nir_intrinsic()")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31199>
2024-09-17 09:28:58 +00:00
Ian Romanick
447dae7c13 intel/brw: Use nir_opt_generate_bfi
No shader-db changes on any Intel platform.

The "regression" in SEND messages occurs because a loop containing a
SEND is unrolled.

v2: Move after nir_opt_algebraic. Suggested by Georg.

shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19787034 -> 19785933 (<.01%)
instructions in affected programs: 373573 -> 372472 (-0.29%)
helped: 541 / HURT: 6

total cycles in shared programs: 906012612 -> 905626304 (-0.04%)
cycles in affected programs: 58456516 -> 58070208 (-0.66%)
helped: 382 / HURT: 180

fossil-db:

Lunar Lake
Totals:
Instrs: 140671401 -> 140670495 (-0.00%); split: -0.00%, +0.00%
Send messages: 12891430822 -> 12891430834 (+0.00%)
Loop count: 46905 -> 46904 (-0.00%)
Cycle count: 21527511599 -> 21530278999 (+0.01%); split: -0.00%, +0.02%
Spill count: 70728 -> 70766 (+0.05%)
Fill count: 139397 -> 139254 (-0.10%); split: -0.13%, +0.02%
Max live registers: 47512432 -> 47512500 (+0.00%)

Totals from 355 (0.06% of 549270) affected shaders:
Instrs: 878953 -> 878047 (-0.10%); split: -0.18%, +0.08%
Send messages: 19289 -> 19301 (+0.06%)
Loop count: 1243 -> 1242 (-0.08%)
Cycle count: 1434664642 -> 1437432042 (+0.19%); split: -0.06%, +0.25%
Spill count: 15826 -> 15864 (+0.24%)
Fill count: 38454 -> 38311 (-0.37%); split: -0.46%, +0.08%
Max live registers: 52530 -> 52598 (+0.13%)

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 152516575 -> 152516147 (-0.00%); split: -0.00%, +0.00%
Send messages: 7491001 -> 7491013 (+0.00%)
Loop count: 47588 -> 47587 (-0.00%)
Cycle count: 17124433133 -> 17126147156 (+0.01%); split: -0.01%, +0.02%
Max live registers: 31854704 -> 31854764 (+0.00%)

Totals from 402 (0.06% of 633223) affected shaders:
Instrs: 839338 -> 838910 (-0.05%); split: -0.09%, +0.04%
Send messages: 20203 -> 20215 (+0.06%)
Loop count: 1243 -> 1242 (-0.08%)
Cycle count: 1327042160 -> 1328756183 (+0.13%); split: -0.11%, +0.24%
Max live registers: 33237 -> 33297 (+0.18%)

Tiger Lake
*** Shaders only in 'before' results are ignored:
fossil-db/steam-native/wolfenstein_youngblood/b8cefe7f700304c4/fs.32/0
from 1 apps: fossil-db/steam-native/wolfenstein_youngblood

Totals:
Instrs: 150549467 -> 150548952 (-0.00%); split: -0.00%, +0.00%
Send messages: 7495582 -> 7495594 (+0.00%)
Loop count: 46605 -> 46604 (-0.00%)
Cycle count: 15472381586 -> 15472247085 (-0.00%); split: -0.00%, +0.00%
Spill count: 59776 -> 59775 (-0.00%)
Fill count: 103475 -> 103464 (-0.01%)
Scratch Memory Size: 2384896 -> 2383872 (-0.04%)
Max live registers: 31760724 -> 31760787 (+0.00%)
Max dispatch width: 5569928 -> 5569912 (-0.00%)

Totals from 525 (0.08% of 632443) affected shaders:
Instrs: 349074 -> 348559 (-0.15%); split: -0.25%, +0.11%
Send messages: 24355 -> 24367 (+0.05%)
Loop count: 849 -> 848 (-0.12%)
Cycle count: 187080291 -> 186945790 (-0.07%); split: -0.19%, +0.12%
Spill count: 483 -> 482 (-0.21%)
Fill count: 1372 -> 1361 (-0.80%)
Scratch Memory Size: 22528 -> 21504 (-4.55%)
Max live registers: 36705 -> 36768 (+0.17%)
Max dispatch width: 6272 -> 6256 (-0.26%)

Ice Lake
Totals:
Instrs: 151804923 -> 151804396 (-0.00%); split: -0.00%, +0.00%
Send messages: 7553216 -> 7553228 (+0.00%)
Loop count: 46196 -> 46195 (-0.00%)
Cycle count: 15099805668 -> 15099533898 (-0.00%); split: -0.00%, +0.00%
Fill count: 103978 -> 103979 (+0.00%)
Max live registers: 32168254 -> 32168323 (+0.00%)

Totals from 527 (0.08% of 637191) affected shaders:
Instrs: 347482 -> 346955 (-0.15%); split: -0.25%, +0.10%
Send messages: 24586 -> 24598 (+0.05%)
Loop count: 849 -> 848 (-0.12%)
Cycle count: 191147758 -> 190875988 (-0.14%); split: -0.16%, +0.02%
Fill count: 1392 -> 1393 (+0.07%)
Max live registers: 37379 -> 37448 (+0.18%)

Skylake
Totals:
Instrs: 140981504 -> 140980647 (-0.00%); split: -0.00%, +0.00%
Cycle count: 14653477192 -> 14653249734 (-0.00%); split: -0.00%, +0.00%
Fill count: 99636 -> 99637 (+0.00%)
Max live registers: 31472062 -> 31472126 (+0.00%)

Totals from 523 (0.08% of 626432) affected shaders:
Instrs: 335551 -> 334694 (-0.26%); split: -0.26%, +0.01%
Cycle count: 178047284 -> 177819826 (-0.13%); split: -0.14%, +0.02%
Fill count: 1100 -> 1101 (+0.09%)
Max live registers: 36734 -> 36798 (+0.17%)

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31006>
2024-09-13 00:21:00 +00:00
Kenneth Graunke
b8f264cfe4 intel/brw: Handle load/stores in lsc_op_for_nir_intrinsic()
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
8a6903e50d intel/brw: Rename lsc_aop_for_nir_intrinsic to "op" instead of "aop"
This is going to handle more than atomics shortly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Ian Romanick
13332c236b intel/brw: Unconditionally run optimizations after nir_opt_uniform_subgroup
I observed some ray tracing shaders where a resource_intel inside a
loop was non-uniform, and some code was lowered to account for
that. Eventually the loop containing the resource_intel was unrolled,
and the resource_intel became uniform.

For example, nir_opt_uniform_subgroup can transform something like

    con loop {
        con block b5:        // preds: b4 b8
        con 32    %330 = @read_first_invocation (%329)
        con 1     %331 = ieq %330, %329
                         // succs: b6 b7
        if %331 {
            con block b6:        // preds: b5
            con 32    %332 = iadd %120.b, %330
            con 32    %333 = @resource_intel (%125 (0xdeaddeed), %332, %125 (0xdeaddeed), %3 (0x0)) (desc_set=1, binding=2, resource_intel=bindless|non-uniform, resource_block_intel=-1)
            div 32x4  %334 = (float32)txl %333 (texture_handle), %130 (sampler_handle), %327 (coord), %275 (lod), 0 (texture), 0 (sampler)
                             break
                             // succs: b9
        } else {
            con block b7:  // preds: b5, succs: b8
        }
        con block b8:  // preds: b7, succs: b5
    }

into

    con loop {
        con block b5:        // preds: b4 b8
        con 1     %331 = ieq %329, %329
                         // succs: b6 b7
        if %331 {
            con block b6:        // preds: b5
            con 32    %332 = iadd %120.b, %329
            con 32    %333 = @resource_intel (%125 (0xdeaddeed), %332, %125 (0xdeaddeed), %3 (0x0)) (desc_set=1, binding=2, resource_intel=bindless|non-uniform, resource_block_intel=-1)
            div 32x4  %334 = (float32)txl %333 (texture_handle), %130 (sampler_handle), %327 (coord), %275 (lod), 0 (texture), 0 (sampler)
                             break
                             // succs: b9
        } else {
            con block b7:  // preds: b5, succs: b8
        }
        con block b8:  // preds: b7, succs: b5
    }

Notice that %331 is now a tautology. Running brw_nir_optimize again
eliminates the loop.

v2: Add a comment in the code explaining the rationale. Suggested by
Ken. Update the commit message. Suggested by Caio.

shader-db:

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
total instructions in shared programs: 19733448 -> 19733330 (<.01%)
instructions in affected programs: 14120 -> 14002 (-0.84%)
helped: 32 / HURT: 3

total cycles in shared programs: 916254496 -> 916226288 (<.01%)
cycles in affected programs: 2035116 -> 2006908 (-1.39%)
helped: 19 / HURT: 13

total spills in shared programs: 5807 -> 5807 (0.00%)
spills in affected programs: 26 -> 26 (0.00%)
helped: 1 / HURT: 1

total fills in shared programs: 6794 -> 6792 (-0.03%)
fills in affected programs: 84 -> 82 (-2.38%)
helped: 1 / HURT: 1

LOST:   1
GAINED: 1

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20393084 -> 20392971 (<.01%)
instructions in affected programs: 21750 -> 21637 (-0.52%)
helped: 31 / HURT: 4

total cycles in shared programs: 880273065 -> 880247818 (<.01%)
cycles in affected programs: 2546748 -> 2521501 (-0.99%)
helped: 18 / HURT: 9

total spills in shared programs: 4628 -> 4630 (0.04%)
spills in affected programs: 287 -> 289 (0.70%)
helped: 1 / HURT: 2

total fills in shared programs: 5381 -> 5376 (-0.09%)
fills in affected programs: 711 -> 706 (-0.70%)
helped: 2 / HURT: 2

LOST:   1
GAINED: 1

fossil-db:

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 151513669 -> 151505520 (-0.01%); split: -0.01%, +0.00%
Send messages: 7459339 -> 7459396 (+0.00%)
Loop count: 49111 -> 47588 (-3.10%)
Cycle count: 17208178205 -> 17201385104 (-0.04%); split: -0.05%, +0.01%
Spill count: 80830 -> 80827 (-0.00%); split: -0.02%, +0.01%
Fill count: 152754 -> 152693 (-0.04%); split: -0.04%, +0.00%
Scratch Memory Size: 4136960 -> 4130816 (-0.15%)
Max live registers: 32016493 -> 32015955 (-0.00%); split: -0.00%, +0.00%

Totals from 672 (0.11% of 630198) affected shaders:
Instrs: 1352428 -> 1344279 (-0.60%); split: -0.78%, +0.17%
Send messages: 54302 -> 54359 (+0.10%)
Loop count: 6124 -> 4601 (-24.87%)
Cycle count: 1260266379 -> 1253473278 (-0.54%); split: -0.69%, +0.16%
Spill count: 15967 -> 15964 (-0.02%); split: -0.09%, +0.08%
Fill count: 36245 -> 36184 (-0.17%); split: -0.18%, +0.01%
Scratch Memory Size: 740352 -> 734208 (-0.83%)
Max live registers: 50699 -> 50161 (-1.06%); split: -1.45%, +0.39%

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 149976046 -> 149971100 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 7685264 -> 7685256 (-0.00%)
Cycle count: 15566401168 -> 15566405478 (+0.00%); split: -0.00%, +0.00%
Spill count: 61238 -> 61240 (+0.00%)
Fill count: 107301 -> 107289 (-0.01%)
Max live registers: 31992969 -> 31993857 (+0.00%); split: -0.00%, +0.00%

Totals from 553 (0.09% of 629912) affected shaders:
Instrs: 557027 -> 552081 (-0.89%); split: -0.90%, +0.01%
Subgroup size: 8648 -> 8640 (-0.09%)
Cycle count: 150154496 -> 150158806 (+0.00%); split: -0.23%, +0.24%
Spill count: 181 -> 183 (+1.10%)
Fill count: 440 -> 428 (-2.73%)
Max live registers: 33698 -> 34586 (+2.64%); split: -0.02%, +2.65%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
2024-08-30 03:39:31 +00:00
Ian Romanick
65eb7ed5fc intel/brw: Run intel_nir_lower_conversions only after brw_nir_optimize
Without this, the next commit tiggers assertions.

v2: Unconditionally do the lowering after brw_nir_optimize. Suggested by
Caio.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
2024-08-30 03:39:31 +00:00
Jesse Natalie
03655dfda1 compiler, vk: Support subgroup size of 4
Relax the assert and assign it an enum value

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30876>
2024-08-29 03:30:31 +00:00
Kenneth Graunke
6a292c2699 intel: Fix bad align_offset on global_constant_uniform_block_intel
We were specifying align_offset = 64 and align_mul = 64, which is
invalid.  nir_combined_align() asserts that align_offset < align_mul.

Our intention here is to perform cacheline-aligned (64B-aligned) block
loads, so we should set align_mul = 64 and can leave align_offset = 0.

Fixes: fbafa9cabd ("intel/nir: remove load_global_const_block_intel intrinsic")
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30755>
2024-08-21 20:44:57 +00:00
Lionel Landwerlin
fbafa9cabd intel/nir: remove load_global_const_block_intel intrinsic
load_global_constant_uniform_block_intel is equivalent in terms of
loading, then for the predicate we just do a bcsel afterward in places
where that is required.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30659>
2024-08-16 11:12:39 +00:00
Ian Romanick
119801e647 intel/brw: Move fsat instructions closer to the source
Intel GPUs have a saturate destination modifier, and
brw_fs_opt_saturate_propagation tries to replace explicit saturate
operations with this destination modifier. That pass is limited in
several ways. If the source of the explicit saturate is in a different
block or if the source of the explicit saturate is live after the
explicit saturate, brw_fs_opt_saturate_propagation will be unable to
make progress.

This optimization exists to help brw_fs_opt_saturate_propagation make
more progress. It tries to move NIR fsat instructions to the same block
that contains the definition of its source. It does this only in cases
where it will not create additional live values. It also attempts to do
this only in cases where the explicit saturate will ultimiately be
converted to a destination modifier.

v2: Fix metadata_preserve when theres no progress and use
nir_metadata_control_flow when there is progress. All suggested by
Alyssa.

v3: Fix a typo in the file header comment. Noticed by Ken.  Don't
require nir_metadata_instr_index. Use nir_def_rewrite_uses_after instead
of open-coding something slightly more specific. Both suggested by Ken.

shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19733645 -> 19733028 (<.01%)
instructions in affected programs: 193300 -> 192683 (-0.32%)
helped: 246
HURT: 1
helped stats (abs) min: 2 max: 48 x̄: 2.51 x̃: 2
helped stats (rel) min: 0.18% max: 0.39% x̄: 0.33% x̃: 0.34%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.31% max: 0.31% x̄: 0.31% x̃: 0.31%
95% mean confidence interval for instructions value: -2.87 -2.13
95% mean confidence interval for instructions %-change: -0.34% -0.32%
Instructions are helped.

total cycles in shared programs: 916180971 -> 916264656 (<.01%)
cycles in affected programs: 30197180 -> 30280865 (0.28%)
helped: 194
HURT: 142
helped stats (abs) min: 1 max: 21251 x̄: 872.75 x̃: 19
helped stats (rel) min: <.01% max: 23.17% x̄: 2.59% x̃: 0.23%
HURT stats (abs)   min: 1 max: 28058 x̄: 1781.68 x̃: 399
HURT stats (rel)   min: <.01% max: 37.21% x̄: 4.85% x̃: 1.63%
95% mean confidence interval for cycles value: -196.84 694.97
95% mean confidence interval for cycles %-change: -0.17% 1.27%
Inconclusive result (value mean confidence interval includes 0).

fossil-db:

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 151512021 -> 151511351 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17209013596 -> 17209840995 (+0.00%); split: -0.02%, +0.02%
Max live registers: 32013312 -> 32013549 (+0.00%)
Max dispatch width: 5512304 -> 5512136 (-0.00%)

Totals from 774 (0.12% of 630172) affected shaders:
Instrs: 1559285 -> 1558615 (-0.04%); split: -0.05%, +0.01%
Cycle count: 1312656268 -> 1313483667 (+0.06%); split: -0.24%, +0.30%
Max live registers: 82195 -> 82432 (+0.29%)
Max dispatch width: 6664 -> 6496 (-2.52%)

Ice Lake
Totals:
Instrs: 151416791 -> 151416137 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15162468885 -> 15163298824 (+0.01%); split: -0.00%, +0.01%
Max live registers: 32471367 -> 32471603 (+0.00%)
Max dispatch width: 5623752 -> 5623712 (-0.00%)

Totals from 733 (0.12% of 635598) affected shaders:
Instrs: 877965 -> 877311 (-0.07%); split: -0.09%, +0.01%
Cycle count: 190763628 -> 191593567 (+0.44%); split: -0.21%, +0.64%
Max live registers: 72067 -> 72303 (+0.33%)
Max dispatch width: 6216 -> 6176 (-0.64%)

Skylake
Totals:
Instrs: 140794845 -> 140794075 (-0.00%); split: -0.00%, +0.00%
Cycle count: 14665159301 -> 14665320514 (+0.00%); split: -0.00%, +0.01%
Max live registers: 31783341 -> 31783662 (+0.00%); split: -0.00%, +0.00%

Totals from 659 (0.11% of 625670) affected shaders:
Instrs: 829061 -> 828291 (-0.09%); split: -0.09%, +0.00%
Cycle count: 185478478 -> 185639691 (+0.09%); split: -0.33%, +0.41%
Max live registers: 67491 -> 67812 (+0.48%); split: -0.01%, +0.48%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>
2024-08-09 14:26:10 -07:00
Kenneth Graunke
b6f4f64b43 intel/brw: Drop image_{load,store}_raw_intel handling
Gfx8 required us to emulate image load store with untyped messages,
whereas Gfx9 just has typed message support for everything.  brw no
longer supports Gfx8, so all of this code is effectively dead.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30576>
2024-08-09 07:20:08 +00:00
Alyssa Rosenzweig
d99c2ef059 nir/opt_uniform_atomics: add fs atomics predicated? flag
on agx (and mali), we predicate atomics on "if (!helper)", so doing so again in
this pass is redundant. and would cause a problem since we'd then have to lower
the "is helper inv?" flag late. so just skip the extra lowering code.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30488>
2024-08-06 11:48:17 -04:00
Kenneth Graunke
7c579f448f intel/brw: Mark all UBO access with a direct buffer index as speculative
UBO loads with a non-indirect buffer index should be safe to perform
speculatively.  With a direct offset, we may sometimes turn them into
push constants, at which point it's just reading a register with no
cost at all.  Otherwise, we access them via messages that use surface
state, and automatically perform bounds checking.  So we shouldn't have
any issues with reading out of bounds and page faulting, for example.

This allows nir_opt_peephole_sel() to operate on load_ubo intrinsics,
so we can turn simple if's with loads on both sides to bcsels.  In some
cases this can collapse a surprising amount of control flow, allowing
other optimizations to work better.

The i965 OpenGL driver used load_uniform intrinsics, which are allowed
in NIR's peephole select pass.  But iris uses the Gallium NIR pass that
translates uniforms to loads from UBO 0, so we haven't been able to take
advantage of NIR's peephole select pass there.  The backend pass was
still able to handle this to some extent, however.

fossil-db results on Alchemist:

   Totals:
   Instrs: 150656329 -> 150645307 (-0.01%); split: -0.01%, +0.00%
   Cycles: 12635230179 -> 12633696811 (-0.01%); split: -0.02%, +0.00%
   Send messages: 7416330 -> 7416261 (-0.00%)
   Spill count: 52471 -> 52473 (+0.00%)
   Fill count: 100818 -> 100803 (-0.01%); split: -0.02%, +0.00%
   Scratch Memory Size: 3197952 -> 3198976 (+0.03%)

   Totals from 1848 (0.29% of 630003) affected shaders:
   Instrs: 1412300 -> 1401278 (-0.78%); split: -0.80%, +0.02%
   Cycles: 1809789567 -> 1808256199 (-0.08%); split: -0.11%, +0.03%
   Send messages: 59829 -> 59760 (-0.12%)
   Spill count: 3870 -> 3872 (+0.05%)
   Fill count: 9693 -> 9678 (-0.15%); split: -0.18%, +0.02%
   Scratch Memory Size: 174080 -> 175104 (+0.59%)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>
2024-08-05 19:17:55 -07:00
Sushma Venkatesh Reddy
0116430d39 intel/brw: Handle 16-bit sampler return payloads
API requires samplers to return 32-bit even though hardware can handle
16-bit floating point, so we detect that case and make more efficient
use of memory BW. This is helping improve performance of encode and
decode tokens during LLM by at least 5% across multiple platforms.

Thank you Kenneth Graunke for suggesting and guiding me throughout
this implementation.

Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30447>
2024-07-31 21:26:46 +00:00
Marek Olšák
b2d32ae246 nir: add nir_intrinsic_load_per_primitive_input, split from io_semantics flag
Instead of having 1 bit in nir_io_semantics indicating a per-primitive
FS input, add a dedicated intrinsic for it.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29895>
2024-07-23 16:13:16 +00:00
Qiang Yu
3151f5ec47 nir: add filter parameter to nir_lower_array_deref_of_vec
To be used by latter commits to limit the lowering to specific
variables.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29799>
2024-07-03 02:06:56 +00:00
Francisco Jerez
e8007c9325 intel/fs/xe2+: Don't lower barycentric load offsets to fixed-point format on Xe2+.
Floating-point offsets work fine in combination with the
floating-point arithmetic we're about to lower these intrinsics into,
and they require less instructions than converting to fixed-point and
then back.  No reason to take the precision/range hit nor the extra
instructions.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>
2024-06-27 00:18:00 +00:00
Alyssa Rosenzweig
da752ed7c1 treewide: use nir_def_replace sometimes
Two Coccinelle patches here. Didn't catch nearly as much as I would've liked but
it's a start.

Coccinelle patch:

    @@
    expression intr, repl;
    @@

    -nir_def_rewrite_uses(&intr->def, repl);
    -nir_instr_remove(&intr->instr);
    +nir_def_replace(&intr->def, repl);

Coccinelle patch:

    @@
    identifier intr;
    expression instr, repl;
    @@

    nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);
    ...
    -nir_def_rewrite_uses(&intr->def, repl);
    -nir_instr_remove(instr);
    +nir_def_replace(&intr->def, repl);

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com> [broadcom]
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> [lima]
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> [etna]
Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com> [r300]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29817>
2024-06-21 15:36:56 +00:00
Alyssa Rosenzweig
15257b65c6 treewide: use nir_metadata_control_flow
Via Coccinelle patch:

    @@
    @@

    -nir_metadata_block_index | nir_metadata_dominance
    +nir_metadata_control_flow

...plus some manual fixups for call sites missed by coccinelle.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Juan A. Suarez Romero <jasuarez@igalia.com> [broadcom]
Acked-by: Vasily Khoruzhick <anarsoul@gmail.com> [lima]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29745>
2024-06-17 16:28:14 -04:00
Ian Romanick
7b7e5cf5d4 nir/algebraic: intel/fs: Optimize some patterns before lowering 64-bit integers
v2: Add some comments explaining some of the nuance of the shift
optimizations. Fix a bug in the shift count calculation of the upper
32-bits. Move the @64 from the variable to the opcode. All suggested
by Jordan.

No shader-db changes on any Intel platform.

fossil-db:

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 154507026 -> 154506576 (-0.00%)
Cycle count: 17436298868 -> 17436295016 (-0.00%)
Max live registers: 32635309 -> 32635297 (-0.00%)

Totals from 42 (0.01% of 632575) affected shaders:
Instrs: 5616 -> 5166 (-8.01%)
Cycle count: 133680 -> 129828 (-2.88%)
Max live registers: 1158 -> 1146 (-1.04%)

No fossil-db changes on any other Intel platform.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
2024-05-31 09:13:23 -07:00
Lionel Landwerlin
9a36278475 intel/nir: add printf lowering
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>
2024-05-15 13:13:38 +00:00
Ian Romanick
3f151c03af intel/brw: Handle fsign optimization in a NIR algebraic pass
This is a lot less code, and it makes it easier to experiment with other
pattern-based optimizations in the future.

The results here are nearly identical to the results I got from Ken's
"intel/brw: Make fsign (for 16/32-bit) in SSA form"... which are not
particularly good.

In this commit and in Ken's, all of the shader-db shaders hurt for
spills and fills are from Deus Ex Mankind Divided. Each shader has a
bunch of texture instructions with a single fsign between the
blocks. With the dependency on the flag removed, the scheduler puts all
of the texture instructions at the start... and there are a LOT of them.

shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19647060 -> 19650207 (0.02%)
instructions in affected programs: 734718 -> 737865 (0.43%)
helped: 382 / HURT: 1984

total cycles in shared programs: 823238442 -> 822785913 (-0.05%)
cycles in affected programs: 426901157 -> 426448628 (-0.11%)
helped: 3408 / HURT: 3671

total spills in shared programs: 3887 -> 3891 (0.10%)
spills in affected programs: 256 -> 260 (1.56%)
helped: 0 / HURT: 4

total fills in shared programs: 3236 -> 3306 (2.16%)
fills in affected programs: 882 -> 952 (7.94%)
helped: 0 / HURT: 12

LOST:   37
GAINED: 34

fossil-db:

DG2 and Meteor Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 154005469 -> 154008294 (+0.00%); split: -0.00%, +0.00%
Cycle count: 17551859277 -> 17554293955 (+0.01%); split: -0.02%, +0.04%
Spill count: 142078 -> 142090 (+0.01%)
Fill count: 266761 -> 266729 (-0.01%); split: -0.02%, +0.01%
Max live registers: 32593578 -> 32593858 (+0.00%)
Max dispatch width: 5535944 -> 5536816 (+0.02%); split: +0.02%, -0.01%

Totals from 5867 (0.93% of 631350) affected shaders:
Instrs: 5475544 -> 5478369 (+0.05%); split: -0.04%, +0.09%
Cycle count: 1649032029 -> 1651466707 (+0.15%); split: -0.24%, +0.39%
Spill count: 26411 -> 26423 (+0.05%)
Fill count: 57364 -> 57332 (-0.06%); split: -0.10%, +0.04%
Max live registers: 431561 -> 431841 (+0.06%)
Max dispatch width: 49784 -> 50656 (+1.75%); split: +2.38%, -0.63%

Tiger Lake
Totals:
Instrs: 149530671 -> 149533588 (+0.00%); split: -0.00%, +0.00%
Cycle count: 15261418953 -> 15264764921 (+0.02%); split: -0.00%, +0.03%
Spill count: 60317 -> 60316 (-0.00%); split: -0.02%, +0.01%
Max live registers: 32249201 -> 32249464 (+0.00%)
Max dispatch width: 5540608 -> 5540584 (-0.00%)

Totals from 5862 (0.93% of 630309) affected shaders:
Instrs: 4740800 -> 4743717 (+0.06%); split: -0.04%, +0.10%
Cycle count: 566531248 -> 569877216 (+0.59%); split: -0.13%, +0.72%
Spill count: 11709 -> 11708 (-0.01%); split: -0.09%, +0.08%
Max live registers: 424560 -> 424823 (+0.06%)
Max dispatch width: 50304 -> 50280 (-0.05%)

Ice Lake
Totals:
Instrs: 150499705 -> 150502608 (+0.00%); split: -0.00%, +0.00%
Cycle count: 15105629116 -> 15105425880 (-0.00%); split: -0.00%, +0.00%
Spill count: 60087 -> 60090 (+0.00%)
Fill count: 100542 -> 100541 (-0.00%); split: -0.00%, +0.00%
Max live registers: 32605215 -> 32605495 (+0.00%)
Max dispatch width: 5617752 -> 5617792 (+0.00%); split: +0.00%, -0.00%

Totals from 5882 (0.93% of 634934) affected shaders:
Instrs: 4737206 -> 4740109 (+0.06%); split: -0.04%, +0.10%
Cycle count: 598882104 -> 598678868 (-0.03%); split: -0.08%, +0.05%
Spill count: 10278 -> 10281 (+0.03%)
Fill count: 22504 -> 22503 (-0.00%); split: -0.01%, +0.01%
Max live registers: 424184 -> 424464 (+0.07%)
Max dispatch width: 50216 -> 50256 (+0.08%); split: +0.25%, -0.18%

Skylake
Totals:
Instrs: 139092612 -> 139095257 (+0.00%); split: -0.00%, +0.00%
Cycle count: 14533550285 -> 14533544716 (-0.00%); split: -0.00%, +0.00%
Spill count: 58176 -> 58172 (-0.01%)
Fill count: 95877 -> 95796 (-0.08%)
Max live registers: 31924594 -> 31924874 (+0.00%)
Max dispatch width: 5484568 -> 5484552 (-0.00%); split: +0.00%, -0.00%

Totals from 5789 (0.93% of 625512) affected shaders:
Instrs: 4481987 -> 4484632 (+0.06%); split: -0.04%, +0.10%
Cycle count: 578310124 -> 578304555 (-0.00%); split: -0.05%, +0.05%
Spill count: 9248 -> 9244 (-0.04%)
Fill count: 19677 -> 19596 (-0.41%)
Max live registers: 415340 -> 415620 (+0.07%)
Max dispatch width: 49720 -> 49704 (-0.03%); split: +0.10%, -0.13%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>
2024-05-14 01:28:20 +00:00
Kenneth Graunke
873fcdff38 intel/brw: Stop using long BRW_REGISTER_TYPE enum names
s/BRW_REGISTER_TYPE/BRW_TYPE/g

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Ian Romanick
24cdbbdaa2 intel/brw: Delete stray nir_opt_dce
No shader-db or fossil-db changes on any Intel platform.

Fixes: f76f4be301 ("intel/compiler: move gen5 final pass to actually be final pass")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>
2024-04-04 23:42:27 +00:00
Ian Romanick
6377e8fd29 intel/brw: Don't call nir_opt_remove_phis before nir_convert_from_ssa
Per discussion in #10727, removing phis breaks LCSSA form which in turn
invalidates divergence analysis.

shader-db:

All Skylake and newer platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20299612 -> 20299695 (<.01%)
instructions in affected programs: 20829 -> 20912 (0.40%)
helped: 6 / HURT: 13

total cycles in shared programs: 842149085 -> 842148399 (<.01%)
cycles in affected programs: 15146222 -> 15145536 (<.01%)
helped: 40 / HURT: 45

fossil-db:

All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165505077 -> 165505603 (+0.00%); split: -0.00%, +0.00%
Cycles: 15144183575 -> 15144235695 (+0.00%); split: -0.00%, +0.00%
Spill count: 45213 -> 45220 (+0.02%)
Fill count: 74166 -> 74184 (+0.02%)

Totals from 94 (0.01% of 656116) affected shaders:
Instrs: 263079 -> 263605 (+0.20%); split: -0.00%, +0.20%
Cycles: 28411487 -> 28463607 (+0.18%); split: -0.18%, +0.37%
Spill count: 3474 -> 3481 (+0.20%)
Fill count: 6713 -> 6731 (+0.27%)

Fixes: 6dbb5f1e07 ("intel/fs: rerun divergence analysis prior to convert_from_ssa")
Closes: #10727
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>
2024-04-04 23:42:27 +00:00
Dylan Baker
75ede9d9bc intel/brw: track last successful pass and leave the loop early
This is similar to what RADV implements using the NIR_LOOP_PASS
helpers. I have not used those helpers for a couple of reasons:

 1. They use the pointer to the optimization function, which doesn't
    work if the same function is called multiple times in one invocation
    of the loop (fixable)
 2. After fixing them, due to Intel's use of sub-expressions, the amount
    of code added to wrap the shared macro becomes more than simply
    reimplementing them for the Intel compiler

On most workloads the results are a wash, but on compile heavy
workloads like Cyberpunk 2077 and Rise of the Tomb Raider, I saw
fossil-db runtimes fall by 1-2% on my ICL, with no changes to the
compiled shaders. Caio saw closer to 2.5% on TGL.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27510>
2024-03-21 23:02:32 +00:00
Alyssa Rosenzweig
a6123a80da nir/opt_shrink_vectors: shrink some intrinsics from start
If the backend supports it, intrinsics with a component() are straightforward to
shrink from the start. Notably helps vectorized I/O.

v2: add an option for this and enable only on grown up backends, because some
backends ignore the component() parameter.

RADV GFX11:
Totals from 921 (1.16% of 79439) affected shaders:
Instrs: 616558 -> 615529 (-0.17%); split: -0.30%, +0.14%
CodeSize: 3099864 -> 3095632 (-0.14%); split: -0.25%, +0.11%
Latency: 2177075 -> 2160966 (-0.74%); split: -0.79%, +0.05%
InvThroughput: 299997 -> 298664 (-0.44%); split: -0.47%, +0.02%
VClause: 16343 -> 16395 (+0.32%); split: -0.01%, +0.32%
SClause: 10715 -> 10714 (-0.01%)
Copies: 24736 -> 24701 (-0.14%); split: -0.37%, +0.23%
PreVGPRs: 30179 -> 30173 (-0.02%)
VALU: 353472 -> 353439 (-0.01%); split: -0.03%, +0.02%
SALU: 40323 -> 40322 (-0.00%)
VMEM: 25353 -> 25352 (-0.00%)

AGX:

total instructions in shared programs: 2038217 -> 2038049 (<.01%)
instructions in affected programs: 10249 -> 10081 (-1.64%)

total alu in shared programs: 1593094 -> 1592939 (<.01%)
alu in affected programs: 7145 -> 6990 (-2.17%)

total fscib in shared programs: 1589254 -> 1589102 (<.01%)
fscib in affected programs: 7217 -> 7065 (-2.11%)

total bytes in shared programs: 13975666 -> 13974722 (<.01%)
bytes in affected programs: 65942 -> 64998 (-1.43%)

total regs in shared programs: 592758 -> 591187 (-0.27%)
regs in affected programs: 6936 -> 5365 (-22.65%)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v1)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28004>
2024-03-12 18:17:17 +00:00
Caio Oliveira
865ef36609 intel/brw: Remove brw_shader.h
Find a better home for its existing content.  Some functions are
now just static functions at the usage sites.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>
2024-02-29 19:28:06 +00:00
Kenneth Graunke
5fbba530cf intel/brw: Delete compiler->supports_shader_constants
True for all drivers using this compiler.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27872>
2024-02-29 18:00:14 +00:00
Caio Oliveira
63a4a4400a intel/brw: Remove edgeflag_is_last VS parameter
Suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
5a3f65e678 intel/brw: Remove unused attrib workarounds
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
d3e451780b intel/brw: Inline brw_nir_apply_sampler_key code
It doesn't use the prog_key anymore, so just move the nir_lower_tex
call pass to the single callsite.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
a1e694a890 intel/brw: Remove Gfx8- code from NIR passes
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:38 +00:00
Caio Oliveira
7c23b90537 intel/brw: Always use scalar shaders
Remove scalar_stage[] array, since now it is always scalar.  This
removes any usage of vec4 shaders in brw.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:37 +00:00
Caio Oliveira
303fd4e935 intel/brw: Move type_size_* functions out of vec4-specific file
Will make easier later to delete vec4 files.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:37 +00:00
Ian Romanick
535caaf3e0 nir: Optimize uniform iadd, fadd, and ixor reduction operations
This adds optimizations for iadd, fadd, and ixor with reduce,
inclusive scan, and exclusive scan.

NOTE: The fadd and ixor optimizations had no shader-db or fossil-db
changes on any Intel platform.

NOTE 2: This change "fixes" arb_compute_variable_group_size-local-size
and base-local-size.shader_test on DG2 and MTL. This is just changing
the code path taken to not use whatever path was not working properly
before.

This is a subset of the things optimized by ACO. See also
https://gitlab.freedesktop.org/mesa/mesa/-/issues/3731#note_682802. The
min, max, iand, and ior exclusive_scan optimizations are not
implemented.

Broadwell on shader-db is not happy. I have not investigated.

v2: Silence some warnings about discarding const.

v3: Rename mbcnt to count_active_invocations. Add a big comment
explaining the differences between the two paths. Suggested by Rhys.

shader-db:

All Gfx9 and newer platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20300384 -> 20299545 (<.01%)
instructions in affected programs: 19167 -> 18328 (-4.38%)
helped: 35 / HURT: 0

total cycles in shared programs: 842809750 -> 842766381 (<.01%)
cycles in affected programs: 2160249 -> 2116880 (-2.01%)
helped: 33 / HURT: 2

total spills in shared programs: 4632 -> 4626 (-0.13%)
spills in affected programs: 206 -> 200 (-2.91%)
helped: 3 / HURT: 0

total fills in shared programs: 5594 -> 5581 (-0.23%)
fills in affected programs: 664 -> 651 (-1.96%)
helped: 3 / HURT: 1

fossil-db results:

All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165551893 -> 165513303 (-0.02%)
Cycles: 15132539132 -> 15125314947 (-0.05%); split: -0.05%, +0.00%
Spill count: 45258 -> 45204 (-0.12%)
Fill count: 74286 -> 74157 (-0.17%)
Scratch Memory Size: 2467840 -> 2451456 (-0.66%)

Totals from 712 (0.11% of 656120) affected shaders:
Instrs: 598931 -> 560341 (-6.44%)
Cycles: 184650167 -> 177425982 (-3.91%); split: -3.95%, +0.04%
Spill count: 983 -> 929 (-5.49%)
Fill count: 2274 -> 2145 (-5.67%)
Scratch Memory Size: 52224 -> 35840 (-31.37%)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
2024-02-27 09:44:11 -08:00
Ian Romanick
c63ea755fe intel/fs: Use nir_opt_uniform_subgroup
shader-db:

All Skylake and newer platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20300435 -> 20300384 (<.01%)
instructions in affected programs: 303 -> 252 (-16.83%)
helped: 2 / HURT: 0

total cycles in shared programs: 842810326 -> 842809750 (<.01%)
cycles in affected programs: 8374 -> 7798 (-6.88%)
helped: 2 / HURT: 0

fossil-db:

All Intel platforms (note below) had similar results. (Ice Lake shown)
Instrs: 165559735 -> 165551893 (-0.00%)
Cycles: 15133083961 -> 15132539132 (-0.00%); split: -0.00%, +0.00%
Spill count: 45262 -> 45258 (-0.01%)
Fill count: 74293 -> 74286 (-0.01%)

Totals from 854 (0.13% of 656120) affected shaders:
Instrs: 3461998 -> 3454156 (-0.23%)
Cycles: 154252729 -> 153707900 (-0.35%); split: -0.36%, +0.01%
Spill count: 2655 -> 2651 (-0.15%)
Fill count: 3881 -> 3874 (-0.18%)

DG2 did not see changes in spills or fills.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
2024-02-27 08:38:45 -08:00