Commit graph

3488 commits

Author SHA1 Message Date
Ian Romanick
22095c60bc nir/algebraic: Add nir_lower_int64_options::nir_lower_iadd3_64
This allows us to not generate 64-bit iadd3 on Intel but continue
generating it for NVIDIA.

No shader-db or fossil-db changes.

v2: Add nir_lower_iadd3_64 flag so we can continue to generate 64-bit
iadd3 on NVIDIA platforms.

v3: s/bit_size == 64/s == 64/. This cut-and-paste bug prevented any of
the optimizations from ever occuring.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
2024-05-31 09:13:23 -07:00
Georg Lehmann
dcab408a6c nir: remove unpack_half_flush_to_zero
It doesn't make sense to have two sets of opcodes for this when all backends
that support the flush_to_zero variant just rely on the global floating point
mode anyway.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29433>
2024-05-31 09:46:35 +00:00
Jordan Justen
fbf5ea6b44 intel/dev: Silence INTEL_FORCE_PROBE warning for intel_clc
Running intel_clc as part of the build doesn't need to issue this
warning.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29445>
2024-05-30 22:28:50 +00:00
Kenneth Graunke
fbe0f8d36d intel/brw: Blockify convergent load_shared on Gfx11-12 as well
Gfx11-12 can support SLM block loads via OWord Block Load messages
(notably, the aligned version, not the unaligned version).

A while back we deleted the SHADER_OPCODE_OWORD_BLOCK_READ opcode.
Rather than bring it back, we continue using UNALIGNED_OWORD_BLOCK_READ
for SLM block access (like we do for SSBOs) but switch it over to the
aligned variant when lowering logical sends.  We do ensure the alignment
is at least 16B, however.  This is ugly, but it's probably not worth
bringing back a whole extra opcode for a legacy HDC block load quirk.

References: BSpec 47652 and 1689
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9960
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29429>
2024-05-30 22:01:10 +00:00
José Roberto de Souza
f5f71bae02 intel: Move slm functions from brw_compiler.h to intel_compute_slm.c/h
This functions were inlined in a header and duplicated between brw and
elk.
That would be enough reasons to move to a C file but next patches
will add more code to support Xe2 platforms, what would cause more
code to be inlined, duplicating even more code and increasing lib
size.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28910>
2024-05-30 16:46:16 +00:00
Jordan Justen
4ffe1a9f9e intel/brw: Fix SSBO/shared load offset register size for Xe2
Rework:
 * Ken: Reword commit message

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29273>
2024-05-28 18:45:49 +00:00
Jordan Justen
4bc4da01f4 intel/brw: Allow xe2 in brw_stage_has_packed_dispatch()
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29273>
2024-05-28 18:45:49 +00:00
Jordan Justen
739613ec70 intel/brw: Simplify enabling brw_fs_test_dispatch_packing
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29273>
2024-05-28 18:45:49 +00:00
Lionel Landwerlin
2c65d90bc8 intel/brw: ensure find_live_channel don't access arch register without sync
Another architecture register that requires some care before reading.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 49ee3ae9e8 ("intel/compiler: Lower FIND_[LAST_]LIVE_CHANNEL in IR on Gfx8+")
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29319>
2024-05-24 07:26:17 +00:00
Francisco Jerez
eebc4ec264 intel/brw/xe2+: Round up spill/unspill data size to nearest reg_size multiple.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>
2024-05-15 17:16:52 +00:00
Francisco Jerez
50daf161f4 intel/brw/xe2+: Lower 64-bit integer uadd_sat.
Fixes failures of CTS tests that currently end up emitting 64-bit
integer ADDs with saturation, which isn't supported by the hardware.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>
2024-05-15 17:16:52 +00:00
Francisco Jerez
4bb5b25e53 intel/xe2+: Enable native 64-bit integer arithmetic.
Note that some previously-supported 64-bit integer operations have
been removed from the hardware, so we need to instruct NIR to lower
them.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>
2024-05-15 17:16:51 +00:00
Francisco Jerez
8be9f00d84 intel/brw/xe2+: Lower 64-bit SHUFFLE and CLUSTER_BROADCAST.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>
2024-05-15 17:16:51 +00:00
Francisco Jerez
6261f4d361 intel/brw/xe2+: Fix 64-bit subgroup scan intrinsics not to rely on SEL instructions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>
2024-05-15 17:16:51 +00:00
Francisco Jerez
1bf93ee4ec intel/brw/xe2+: Don't use SEL peephole on 64-bit moves.
64-bit SEL isn't supported by the INT pipeline on this platform.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>
2024-05-15 17:16:51 +00:00
Francisco Jerez
8f798cc911 intel/brw/xe2+: Fix indirect extended descriptor setup for scratch space.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>
2024-05-15 17:16:51 +00:00
Francisco Jerez
0d92ec44e5 intel/brw: Don't emit Z coordinate interpolation if CPS isn't in use.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>
2024-05-15 17:16:51 +00:00
Francisco Jerez
7c129d9365 intel/brw/xe2+: Keep PS sample mask in the f1.0 register whether or not kill is used.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>
2024-05-15 17:16:51 +00:00
Rohan Garg
7668de019b intel/eu/xe2+: Fix src1 length bits of SEND instruction with UGM target.
Rework:
 * Francisco Jerez: Specify the src1 length value in the correct
   units. Don't break earlier platforms.

Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>
2024-05-15 17:16:51 +00:00
Lionel Landwerlin
5b76696861 intel/clc: enable printfs support
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>
2024-05-15 13:13:38 +00:00
Lionel Landwerlin
9a36278475 intel/nir: add printf lowering
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>
2024-05-15 13:13:38 +00:00
Lionel Landwerlin
6a8ff3b550 intel/compiler: store u_printf_info in prog_data
So that the driver can decode the printf buffer.

We're not going to use the NIR data directly from the driver
(Iris/Anv) because the late compile steps might want to add more
printfs.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>
2024-05-15 13:13:38 +00:00
Lionel Landwerlin
ecbec25e84 intel/nir: add reloc delta to load_reloc_const_intel intrinsic
We'll use the delta for an upcoming internal printf mechanism, where
the PARAM_IDX will be the base printf reloc identifier and the BASE
will be the string id.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>
2024-05-15 13:13:38 +00:00
Lionel Landwerlin
dde91d18c2 intel/nir: remove unused prototypes
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>
2024-05-15 13:13:37 +00:00
Rohan Garg
aa9244c8f6 intel/brw: update Xe2 max SIMD message sizes
All the non-transpose messages are SIMD 1,2,4,8,16,32 capable (BSpec
57330)

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29212>
2024-05-15 12:02:02 +00:00
Iván Briano
a9f24fb5f1 intel/brw: fix subgroup size of geometry stages for lnl+
Fixes dEQP-VK.subgroups.size_control.*allow_varying_subgroup_size* and
maybe others checking subgroup size.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29177>
2024-05-14 23:13:37 +00:00
Ian Romanick
97e3c6a12a intel/brw: Use range analysis to optimize fsign
shader-db:

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
total instructions in shared programs: 19674784 -> 19665960 (-0.04%)
instructions in affected programs: 933425 -> 924601 (-0.95%)
helped: 3656 / HURT: 0

total cycles in shared programs: 810343919 -> 810241030 (-0.01%)
cycles in affected programs: 56752034 -> 56649145 (-0.18%)
helped: 3032 / HURT: 434

LOST:   11
GAINED: 0

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20315795 -> 20305856 (-0.05%)
instructions in affected programs: 979698 -> 969759 (-1.01%)
helped: 3845 / HURT: 0

total cycles in shared programs: 830600281 -> 830534694 (<.01%)
cycles in affected programs: 45675615 -> 45610028 (-0.14%)
helped: 3250 / HURT: 325

total spills in shared programs: 4583 -> 4565 (-0.39%)
spills in affected programs: 180 -> 162 (-10.00%)
helped: 3 / HURT: 0

total fills in shared programs: 5245 -> 5219 (-0.50%)
fills in affected programs: 379 -> 353 (-6.86%)
helped: 3 / HURT: 0

LOST:   14
GAINED: 8

fossil-db:

All Intel platforms except Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 154024263 -> 154023814 (-0.00%)
Cycle count: 17463341602 -> 17461726239 (-0.01%); split: -0.01%, +0.00%

Totals from 322 (0.05% of 631440) affected shaders:
Instrs: 199933 -> 199484 (-0.22%)
Cycle count: 168492537 -> 166877174 (-0.96%); split: -0.96%, +0.00%

Tiger Lake
Instrs: 149984723 -> 149984287 (-0.00%)
Cycle count: 15238596937 -> 15239260415 (+0.00%); split: -0.00%, +0.01%
Max dispatch width: 5553408 -> 5553424 (+0.00%)

Totals from 318 (0.05% of 631414) affected shaders:
Instrs: 179624 -> 179188 (-0.24%)
Cycle count: 160724533 -> 161388011 (+0.41%); split: -0.06%, +0.48%
Max dispatch width: 3296 -> 3312 (+0.49%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>
2024-05-14 01:28:21 +00:00
Ian Romanick
e578657313 intel/brw: Implement more strictly correct fsign lowering
The huge amount of helped shaders is due to the "~" versions of the
patterns.

shader-db:

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19672345 -> 19662605 (-0.05%)
instructions in affected programs: 1147766 -> 1138026 (-0.85%)
helped: 2691 / HURT: 1650

total cycles in shared programs: 810323688 -> 810145191 (-0.02%)
cycles in affected programs: 68918312 -> 68739815 (-0.26%)
helped: 3651 / HURT: 1832

LOST:   29
GAINED: 38

Tiger Lake
total instructions in shared programs: 19489619 -> 19479909 (-0.05%)
instructions in affected programs: 1124564 -> 1114854 (-0.86%)
helped: 2682 / HURT: 1643

total cycles in shared programs: 811468406 -> 811706747 (0.03%)
cycles in affected programs: 66397690 -> 66636031 (0.36%)
helped: 3692 / HURT: 1775

total spills in shared programs: 3906 -> 3907 (0.03%)
spills in affected programs: 16 -> 17 (6.25%)
helped: 0 / HURT: 1

total fills in shared programs: 3220 -> 3222 (0.06%)
fills in affected programs: 50 -> 52 (4.00%)
helped: 0 / HURT: 1

LOST:   33
GAINED: 36

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20317882 -> 20307495 (-0.05%)
instructions in affected programs: 1199651 -> 1189264 (-0.87%)
helped: 2863 / HURT: 1680

total cycles in shared programs: 830880024 -> 830457927 (-0.05%)
cycles in affected programs: 63347102 -> 62925005 (-0.67%)
helped: 4118 / HURT: 1622

total spills in shared programs: 4593 -> 4583 (-0.22%)
spills in affected programs: 205 -> 195 (-4.88%)
helped: 4 / HURT: 0

total fills in shared programs: 5284 -> 5245 (-0.74%)
fills in affected programs: 464 -> 425 (-8.41%)
helped: 4 / HURT: 0

LOST:   70
GAINED: 33

fossil-db:

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 154025275 -> 154022035 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17472869499 -> 17463289530 (-0.05%); split: -0.06%, +0.00%
Spill count: 141269 -> 141246 (-0.02%); split: -0.02%, +0.00%
Fill count: 265342 -> 265159 (-0.07%); split: -0.11%, +0.04%
Max live registers: 32597829 -> 32597986 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 5536776 -> 5537048 (+0.00%)

Totals from 1590 (0.25% of 631423) affected shaders:
Instrs: 1146532 -> 1143292 (-0.28%); split: -0.44%, +0.16%
Cycle count: 1230843330 -> 1221263361 (-0.78%); split: -0.83%, +0.05%
Spill count: 15832 -> 15809 (-0.15%); split: -0.19%, +0.04%
Fill count: 36071 -> 35888 (-0.51%); split: -0.79%, +0.29%
Max live registers: 93529 -> 93686 (+0.17%); split: -0.00%, +0.17%
Max dispatch width: 15168 -> 15440 (+1.79%)

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 149564084 -> 149562467 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15151701515 -> 15158290114 (+0.04%); split: -0.00%, +0.04%
Max live registers: 32249443 -> 32249620 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 5540536 -> 5540488 (-0.00%)

Totals from 1605 (0.25% of 630303) affected shaders:
Instrs: 584950 -> 583333 (-0.28%); split: -0.49%, +0.21%
Cycle count: 160926321 -> 167514920 (+4.09%); split: -0.05%, +4.14%
Max live registers: 90851 -> 91028 (+0.19%); split: -0.00%, +0.20%
Max dispatch width: 15440 -> 15392 (-0.31%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>
2024-05-14 01:28:20 +00:00
Ian Romanick
864268ff0d intel/brw: Algebraic optimizations for CSEL
No shader-db or fossil-db changes on any Intel platform. In this MR, the
only benefit of these changes is to convert some "-a > 0" CSEL
comparisons to "a < 0" for improved readability.

v2: Add integer CSEL support

v3: Use fs_inst::resize_sources and brw_type_is_sint. Both suggested by
Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>
2024-05-14 01:28:20 +00:00
Ian Romanick
033405cd4b intel/brw: Combine constants and constant propagation for CSEL
No shader-db or fossil-db changes on any Intel platform. This ends up
begin helpful in "intel/brw: Use range analysis to optimize fsign."

v2: Add integer CSEL support

v3: Massive simplification (-20 lines!) of constant propagation
logic. Suggested by Ken. Add missing CSEL case in supports_src_as_imm.
Noticed by Ken.

v4: While MAD can mix F and HF sources on some platforms, CSEL
cannot. Found by skqp on TGL.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v3]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>
2024-05-14 01:28:20 +00:00
Ian Romanick
504b742b83 intel/brw: Update CSEL source type validation
Gfx9 can only have F, but newer GPUs can have F, HF, *D, or *W. The
source and destination types must still match in size.

v2: Simplify the float vs integer logic. Suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>
2024-05-14 01:28:20 +00:00
Ian Romanick
3f151c03af intel/brw: Handle fsign optimization in a NIR algebraic pass
This is a lot less code, and it makes it easier to experiment with other
pattern-based optimizations in the future.

The results here are nearly identical to the results I got from Ken's
"intel/brw: Make fsign (for 16/32-bit) in SSA form"... which are not
particularly good.

In this commit and in Ken's, all of the shader-db shaders hurt for
spills and fills are from Deus Ex Mankind Divided. Each shader has a
bunch of texture instructions with a single fsign between the
blocks. With the dependency on the flag removed, the scheduler puts all
of the texture instructions at the start... and there are a LOT of them.

shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19647060 -> 19650207 (0.02%)
instructions in affected programs: 734718 -> 737865 (0.43%)
helped: 382 / HURT: 1984

total cycles in shared programs: 823238442 -> 822785913 (-0.05%)
cycles in affected programs: 426901157 -> 426448628 (-0.11%)
helped: 3408 / HURT: 3671

total spills in shared programs: 3887 -> 3891 (0.10%)
spills in affected programs: 256 -> 260 (1.56%)
helped: 0 / HURT: 4

total fills in shared programs: 3236 -> 3306 (2.16%)
fills in affected programs: 882 -> 952 (7.94%)
helped: 0 / HURT: 12

LOST:   37
GAINED: 34

fossil-db:

DG2 and Meteor Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 154005469 -> 154008294 (+0.00%); split: -0.00%, +0.00%
Cycle count: 17551859277 -> 17554293955 (+0.01%); split: -0.02%, +0.04%
Spill count: 142078 -> 142090 (+0.01%)
Fill count: 266761 -> 266729 (-0.01%); split: -0.02%, +0.01%
Max live registers: 32593578 -> 32593858 (+0.00%)
Max dispatch width: 5535944 -> 5536816 (+0.02%); split: +0.02%, -0.01%

Totals from 5867 (0.93% of 631350) affected shaders:
Instrs: 5475544 -> 5478369 (+0.05%); split: -0.04%, +0.09%
Cycle count: 1649032029 -> 1651466707 (+0.15%); split: -0.24%, +0.39%
Spill count: 26411 -> 26423 (+0.05%)
Fill count: 57364 -> 57332 (-0.06%); split: -0.10%, +0.04%
Max live registers: 431561 -> 431841 (+0.06%)
Max dispatch width: 49784 -> 50656 (+1.75%); split: +2.38%, -0.63%

Tiger Lake
Totals:
Instrs: 149530671 -> 149533588 (+0.00%); split: -0.00%, +0.00%
Cycle count: 15261418953 -> 15264764921 (+0.02%); split: -0.00%, +0.03%
Spill count: 60317 -> 60316 (-0.00%); split: -0.02%, +0.01%
Max live registers: 32249201 -> 32249464 (+0.00%)
Max dispatch width: 5540608 -> 5540584 (-0.00%)

Totals from 5862 (0.93% of 630309) affected shaders:
Instrs: 4740800 -> 4743717 (+0.06%); split: -0.04%, +0.10%
Cycle count: 566531248 -> 569877216 (+0.59%); split: -0.13%, +0.72%
Spill count: 11709 -> 11708 (-0.01%); split: -0.09%, +0.08%
Max live registers: 424560 -> 424823 (+0.06%)
Max dispatch width: 50304 -> 50280 (-0.05%)

Ice Lake
Totals:
Instrs: 150499705 -> 150502608 (+0.00%); split: -0.00%, +0.00%
Cycle count: 15105629116 -> 15105425880 (-0.00%); split: -0.00%, +0.00%
Spill count: 60087 -> 60090 (+0.00%)
Fill count: 100542 -> 100541 (-0.00%); split: -0.00%, +0.00%
Max live registers: 32605215 -> 32605495 (+0.00%)
Max dispatch width: 5617752 -> 5617792 (+0.00%); split: +0.00%, -0.00%

Totals from 5882 (0.93% of 634934) affected shaders:
Instrs: 4737206 -> 4740109 (+0.06%); split: -0.04%, +0.10%
Cycle count: 598882104 -> 598678868 (-0.03%); split: -0.08%, +0.05%
Spill count: 10278 -> 10281 (+0.03%)
Fill count: 22504 -> 22503 (-0.00%); split: -0.01%, +0.01%
Max live registers: 424184 -> 424464 (+0.07%)
Max dispatch width: 50216 -> 50256 (+0.08%); split: +0.25%, -0.18%

Skylake
Totals:
Instrs: 139092612 -> 139095257 (+0.00%); split: -0.00%, +0.00%
Cycle count: 14533550285 -> 14533544716 (-0.00%); split: -0.00%, +0.00%
Spill count: 58176 -> 58172 (-0.01%)
Fill count: 95877 -> 95796 (-0.08%)
Max live registers: 31924594 -> 31924874 (+0.00%)
Max dispatch width: 5484568 -> 5484552 (-0.00%); split: +0.00%, -0.00%

Totals from 5789 (0.93% of 625512) affected shaders:
Instrs: 4481987 -> 4484632 (+0.06%); split: -0.04%, +0.10%
Cycle count: 578310124 -> 578304555 (-0.00%); split: -0.05%, +0.05%
Spill count: 9248 -> 9244 (-0.04%)
Fill count: 19677 -> 19596 (-0.41%)
Max live registers: 415340 -> 415620 (+0.07%)
Max dispatch width: 49720 -> 49704 (-0.03%); split: +0.10%, -0.13%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>
2024-05-14 01:28:20 +00:00
Ian Romanick
cd343fb9ac intel/brw: Add support for fcsel opcodes
Don't enable nir_opt_algebraic to generate these opcodes yet.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>
2024-05-14 01:28:20 +00:00
Ian Romanick
d51ad9f4e0 intel/brw: Use fs_inst::resize_sources in brw_fs_opt_algebraic
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>
2024-05-14 01:28:20 +00:00
Ian Romanick
11c6b6c102 intel/elk: Remove dsign optimization
This bit from the comment should have been a big red flag:

    There are currently zero instances of fsign(double(x))*IMM in
    shader-db or any test suite, so it is hard to care at this time.

The implementation of that path was incorrect. The XOR instructions
should be predicated like the OR instruction in the non-multiplication
path. As a result, dsign(zero_value) * x will not produce the correct
result.

Instead of fixing this code that is never exercised by anything, replace
it with the simple lowering in NIR.

Ironically, the vec4 implementation is correct. The odds of encountering
an application that is performace limited by dsign performance in vertex
processing stages on Ivy Bridge or Haswell is infinitesimal.

No shader-db changes on any Intel platform.

v2: Delete 's' in emit_fsign as it is now unused.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>
2024-05-14 01:28:20 +00:00
Ian Romanick
ded8690336 intel/brw: Remove dsign optimization
This bit from the comment should have been a big red flag:

    There are currently zero instances of fsign(double(x))*IMM in
    shader-db or any test suite, so it is hard to care at this time.

The implementation of that path was incorrect. The XOR instructions
should be predicated like the OR instruction in the non-multiplication
path. As a result, dsign(zero_value) * x will not produce the correct
result.

Instead of fixing this code that is never exercised by anything, replace
it with the simple lowering in NIR.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>
2024-05-14 01:28:20 +00:00
Caio Oliveira
b8dbd64267 intel/brw: Fix commas when dumping instructions
Some commas were being skipped, according to history as an attempt
to elide BAD_FILEs, but we still print them, so be consistent.  Also
for instructions without any sources, the trailing comma was always
being printed.  Fix that too.

Example of instruction output before the change

    halt_target(8) (null):UD,
    send(8) (mlen: 1) (EOT) (null):UD, 0u, 0u, g126:UD(null):UD NoMask

and after it

    halt_target(8) (null):UD
    send(8) (mlen: 1) (EOT) (null):UD, 0u, 0u, g126:UD, (null):UD NoMask

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29114>
2024-05-11 02:17:57 +00:00
Caio Oliveira
c9fe20fdf1 intel/brw: Use vNN instead of vgrfNN when printing instructions
Reduce the noise in the shader dump output.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29114>
2024-05-11 02:17:56 +00:00
Caio Oliveira
3a081106b0 intel/brw: Hide register pressure information in dumps
It was the default to show register pressure for each instruction,
but it gets in the way of cleaner diffs before/after an optimization pass.
Add INTEL_DEBUG=reg-pressure option to show it again.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29114>
2024-05-11 02:17:56 +00:00
Caio Oliveira
866b1245e9 intel/brw: Don't print IP as part of the dump
The sequential IP cause noise when diffing before/after a pass that
either add or remove instructions.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29114>
2024-05-11 02:17:56 +00:00
Lionel Landwerlin
fd47f90d37 brw: drop dependency on libintel_common
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11136
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29128>
2024-05-11 01:52:01 +00:00
Lionel Landwerlin
d1c01e256d brw: add more condition for reducing sampler simdness
Running
KHR-GL46.sparse_texture_clamp_tests.SparseTextureClampLookupColor test
with Zink on Anv we run into an assert :

assert(inst->mlen <= MAX_SAMPLER_MESSAGE_SIZE * reg_unit(devinfo));

Turns out we've not covered all the cases in the SIMD lowering.

It's a bit of a shame to have both files reproduce the same logic.
Will try to think of a better way to extract the layout of the a send
message but that'll be a much bigger rework.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29118>
2024-05-10 19:40:00 +00:00
Sagar Ghuge
69fc7ee622 intel/disasm: Fix cache load/store disassembly for URB messages
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28868>
2024-05-09 19:45:18 +00:00
Faith Ekstrand
9d5b4a4ffd intel/kernel: Use the new capabilities struct
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28905>
2024-05-09 01:14:23 +00:00
Sagar Ghuge
e32828f5fc intel/compiler: Fix destination type for CMP/CMPN
For CMP/CMPN, use src0 type if destination is null otherwise get the
src0 type register with destination register size.

This fixes dEQP-VK.glsl.builtin_var.frontfacing.* tests cases on Xe2+.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28679>
2024-05-06 21:46:18 +00:00
Ian Romanick
0fa17962d6 intel/elk: Fix optimize_extract_to_float for i2f of unsigned extract
Fixes fs-uint-to-float-of-extract-int8.shader_test and
fs-uint-to-float-of-extract-int16.shader_test added by piglit!883.

v2: Expand the comment explaining the potential problem. Suggested by
Caio.

Fixes: e6022281f2 ("intel/elk: Rename files to use elk prefix")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27891>
2024-05-03 15:01:43 -07:00
Ian Romanick
fc2360167c intel/brw: Avoid optimize_extract_to_float when it will just be undone later
v2: Add bspec quotation. Suggested by Caio. With better understand of
the restriction, only apply on DG2 and newer platforms.

shader-db:

DG2 and Meteor Lake had similar results. (DG2 shown)
total instructions in shared programs: 19659363 -> 19659360 (<.01%)
instructions in affected programs: 2484 -> 2481 (-0.12%)
helped: 6 / HURT: 1

total cycles in shared programs: 823445738 -> 823432524 (<.01%)
cycles in affected programs: 2619836 -> 2606622 (-0.50%)
helped: 48 / HURT: 63

fossil-db:

DG2 and Meteor Lake had similar results. (DG2 shown)

Totals:
Instrs: 154015863 -> 153987806 (-0.02%); split: -0.02%, +0.00%
Cycle count: 17552172994 -> 17562047866 (+0.06%); split: -0.13%, +0.19%
Spill count: 142124 -> 141544 (-0.41%); split: -0.54%, +0.13%
Fill count: 266803 -> 266046 (-0.28%); split: -0.38%, +0.09%
Scratch Memory Size: 10266624 -> 10271744 (+0.05%); split: -0.02%, +0.07%
Max live registers: 32592428 -> 32592393 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5535944 -> 5535912 (-0.00%); split: +0.00%, -0.00%

Totals from 41887 (6.63% of 631367) affected shaders:
Instrs: 32971032 -> 32942975 (-0.09%); split: -0.10%, +0.01%
Cycle count: 3892086217 -> 3901961089 (+0.25%); split: -0.60%, +0.85%
Spill count: 105669 -> 105089 (-0.55%); split: -0.72%, +0.18%
Fill count: 206459 -> 205702 (-0.37%); split: -0.49%, +0.12%
Scratch Memory Size: 7766016 -> 7771136 (+0.07%); split: -0.03%, +0.09%
Max live registers: 3230515 -> 3230480 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 337232 -> 337200 (-0.01%); split: +0.00%, -0.01%

No shader-db or fossil-db changes on any earlier Intel platforms.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27891>
2024-05-03 15:01:43 -07:00
Ian Romanick
bf5d82654a intel/brw: Fix optimize_extract_to_float for i2f of unsigned extract
Fixes fs-uint-to-float-of-extract-int8.shader_test and
fs-uint-to-float-of-extract-int16.shader_test added by piglit!883.

No shader-db or fossil-db changes on any Intel platform.

v2: Expand the comment explaining the potential problem. Suggested by
Caio.

Fixes: 29ce110be6 ("i965/fs: Remove extract virtual opcodes.")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27891>
2024-05-03 15:01:43 -07:00
Kenneth Graunke
8d983b3425 intel/nir: Set src_type on TCS quads workaround store_output
We weren't setting this and now it's validated, causing assert failures.

Fixes: 1632948a76 ("nir: validate src_type of store_output intrinsics, require bit_size >= 16")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11107
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29027>
2024-05-02 13:58:21 -07:00
Kenneth Graunke
84139470a5 intel/brw: Use VEC for emit_unzip()
Helps make SIMD-split code more SSA-friendly.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>
2024-04-30 17:16:54 -07:00