Commit graph

84 commits

Author SHA1 Message Date
Ian Romanick
d0f1a94e3d brw/build: Prepare BROADCAST for scalar values
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00
Ian Romanick
1bff4f93ca brw: Basic infrastructure to store convergent values as scalars
In SIMD16 and SIMD32, storing convergent values in full 16- or
32-channel registers is wasteful. It wastes register space, and in most
cases on SIMD32, it wastes instructions. Our register allocator is not
clever enough to handle scalar allocations. It's fundamental unit of
allocation is SIMD8. Start treating convergent values as SIMD8.

Add a tracking bit in brw_reg to specify that a register represents a
convergent, scalar value. This has two implications:

1. All channels of the SIMD8 register must contain the same value. In
   general, this means that writes to the register must be
   force_writemask_all and exec_size = 8;

2. Reads of this register can (and should) use <0,1,0> stride. SIMD8
   instructions that have restrictions on source stride can us <8,8,1>.

Values that are vectors (e.g., results of load_uniform or texture
operations) will be stored as multiple SIMD8 hardware registers.

v2: brw_fs_opt_copy_propagation_defs fix from Ken. Fix for Xe2.

v3: Eliminte offset_to_scalar(). Remove mention of vec4 backend in
brw_reg.h. Both suggested by Caio. The offset_to_scalar() change
necessitates some trickery in the fs_builder offset() function, but I
think this is an improvement overall. There is also some rework in
find_value_for_offset to account for the possibility that is_scalar
sources in LOAD_PAYLOAD might be <8;8,1> or <0;1,0>.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00
Caio Oliveira
abe41b1d2c intel/compiler: Use #pragma once instead of header guards
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32534>
2024-12-11 19:47:44 +00:00
Ian Romanick
662339a2ff brw/build: Use SIMD8 temporaries in emit_uniformize
The fossil-db results are very different from v1. This is now mostly
helpful on older platforms.

v2: When optimizing BROADCAST or FIND_LIVE_CHANNEL to a simple MOV,
adjust the exec_size to match the size allocated for the destination
register. Fixes EU validation failures in some piglit OpenCL tests
(e.g., atomic_add-global-return.cl).

v3: Use component_size() in emit_uniformize and BROADCAST to properly
account for UQ vs UD destination. This doesn't matter for
emit_uniformize because the type is always UD, but it is technically
more correct.

v4: Update trace checksums. Now amly expects the same checksum as
several other platforms.

v5: Use xbld.dispatch_width() in the builder for when scalar_group()
eventually becomes SIMD1. Suggested by Lionel.

shader-db:

Lunar Lake, Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown)
total instructions in shared programs: 18091701 -> 18091586 (<.01%)
instructions in affected programs: 29616 -> 29501 (-0.39%)
helped: 28 / HURT: 18

total cycles in shared programs: 919250494 -> 919123828 (-0.01%)
cycles in affected programs: 12201102 -> 12074436 (-1.04%)
helped: 124 / HURT: 108

LOST:   0
GAINED: 1

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20480808 -> 20480624 (<.01%)
instructions in affected programs: 58465 -> 58281 (-0.31%)
helped: 61 / HURT: 20

total cycles in shared programs: 874860168 -> 874960312 (0.01%)
cycles in affected programs: 18240986 -> 18341130 (0.55%)
helped: 113 / HURT: 158

total spills in shared programs: 4557 -> 4555 (-0.04%)
spills in affected programs: 93 -> 91 (-2.15%)
helped: 1 / HURT: 0

total fills in shared programs: 5247 -> 5243 (-0.08%)
fills in affected programs: 224 -> 220 (-1.79%)
helped: 1 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 220486064 -> 220486959 (+0.00%); split: -0.00%, +0.00%
Subgroup size: 14102592 -> 14102624 (+0.00%)
Cycle count: 31602733838 -> 31604733270 (+0.01%); split: -0.01%, +0.02%
Max live registers: 65371025 -> 65355084 (-0.02%)

Totals from 12130 (1.73% of 702392) affected shaders:
Instrs: 5162700 -> 5163595 (+0.02%); split: -0.06%, +0.08%
Subgroup size: 388128 -> 388160 (+0.01%)
Cycle count: 751721956 -> 753721388 (+0.27%); split: -0.54%, +0.81%
Max live registers: 1538550 -> 1522609 (-1.04%)

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 241601142 -> 241599114 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 9631168 -> 9631216 (+0.00%)
Cycle count: 25101781573 -> 25097909570 (-0.02%); split: -0.03%, +0.01%
Max live registers: 41540611 -> 41514296 (-0.06%)
Max dispatch width: 6993456 -> 7000928 (+0.11%); split: +0.15%, -0.05%

Totals from 16852 (2.11% of 796880) affected shaders:
Instrs: 6303937 -> 6301909 (-0.03%); split: -0.11%, +0.07%
Subgroup size: 323592 -> 323640 (+0.01%)
Cycle count: 625455880 -> 621583877 (-0.62%); split: -1.20%, +0.58%
Max live registers: 1072491 -> 1046176 (-2.45%)
Max dispatch width: 76672 -> 84144 (+9.75%); split: +14.04%, -4.30%

Tiger Lake
Totals:
Instrs: 235190395 -> 235193286 (+0.00%); split: -0.00%, +0.00%
Cycle count: 23130855720 -> 23128936334 (-0.01%); split: -0.02%, +0.01%
Max live registers: 41644106 -> 41620052 (-0.06%)
Max dispatch width: 6959160 -> 6981512 (+0.32%); split: +0.34%, -0.02%

Totals from 15102 (1.90% of 793371) affected shaders:
Instrs: 5771042 -> 5773933 (+0.05%); split: -0.06%, +0.11%
Cycle count: 371062226 -> 369142840 (-0.52%); split: -1.04%, +0.52%
Max live registers: 989858 -> 965804 (-2.43%)
Max dispatch width: 61344 -> 83696 (+36.44%); split: +38.42%, -1.98%

Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 236063150 -> 236063242 (+0.00%); split: -0.00%, +0.00%
Cycle count: 24516187174 -> 24516027518 (-0.00%); split: -0.00%, +0.00%
Spill count: 567071 -> 567049 (-0.00%)
Fill count: 701323 -> 701273 (-0.01%)
Max live registers: 41914047 -> 41913281 (-0.00%)
Max dispatch width: 7042608 -> 7042736 (+0.00%); split: +0.00%, -0.00%

Totals from 3904 (0.49% of 798473) affected shaders:
Instrs: 2809690 -> 2809782 (+0.00%); split: -0.02%, +0.03%
Cycle count: 182114259 -> 181954603 (-0.09%); split: -0.34%, +0.25%
Spill count: 1696 -> 1674 (-1.30%)
Fill count: 2523 -> 2473 (-1.98%)
Max live registers: 341695 -> 340929 (-0.22%)
Max dispatch width: 32752 -> 32880 (+0.39%); split: +0.44%, -0.05%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097>
2024-12-05 00:15:27 +00:00
Lionel Landwerlin
69edf4144a brw: use transpose unspill messages when possible
This simplifies the unspill messages quite a bit.

A/B testing on DG2 :

BlackOps3 : +0.96%
TotalWarPharaoh: +0.31%

DG2 shader changes :

  Assassin's Creed Valhalla:
  Totals from 19 (0.89% of 2131) affected shaders:
  Instrs: 70542 -> 64369 (-8.75%)
  Cycle count: 18810945 -> 18560169 (-1.33%); split: -1.40%, +0.06%

  Black Ops 3:
  Totals from 55 (3.41% of 1612) affected shaders:
  Instrs: 389549 -> 350646 (-9.99%)
  Cycle count: 344168275 -> 340652311 (-1.02%); split: -1.17%, +0.15%

  Control:
  Totals from 1 (0.11% of 878) affected shaders:
  Instrs: 3409 -> 3212 (-5.78%)
  Cycle count: 255991 -> 250411 (-2.18%)

  Cyberpunk 2077:
  Totals from 1 (0.08% of 1264) affected shaders:
  Instrs: 2363 -> 2337 (-1.10%)
  Cycle count: 69283 -> 69186 (-0.14%)

  Fallout 4:
  Totals from 1 (0.06% of 1601) affected shaders:
  Instrs: 27946 -> 20056 (-28.23%)
  Cycle count: 2391398 -> 2153658 (-9.94%)

  Fortnite:
  Totals from 273 (3.65% of 7470) affected shaders:
  Instrs: 634377 -> 601519 (-5.18%)
  Cycle count: 31870433 -> 31624089 (-0.77%); split: -0.78%, +0.01%

  Hogwarts Legacy:
  Totals from 50 (3.02% of 1656) affected shaders:
  Instrs: 110455 -> 103339 (-6.44%)
  Cycle count: 6613728 -> 6530832 (-1.25%); split: -1.28%, +0.03%

  Metro Exodus:
  Totals from 70 (0.16% of 43076) affected shaders:
  Instrs: 253847 -> 245321 (-3.36%)
  Cycle count: 13269473 -> 13209131 (-0.45%)
  Spill count: 1111 -> 1108 (-0.27%)
  Fill count: 2868 -> 2865 (-0.10%)

  Red Dead Redemption 2:
  Totals from 139 (2.38% of 5847) affected shaders:
  Instrs: 496551 -> 450180 (-9.34%)
  Cycle count: 43233944 -> 40947386 (-5.29%); split: -5.33%, +0.04%
  Spill count: 6322 -> 6326 (+0.06%)
  Fill count: 15558 -> 15568 (+0.06%)

  Rise Of The Tomb Raider:
  Totals from 1 (0.56% of 178) affected shaders:
  Instrs: 1682 -> 1437 (-14.57%)
  Cycle count: 603670 -> 586766 (-2.80%)

  Spiderman Remastered:
  Totals from 820 (11.77% of 6965) affected shaders:
  Instrs: 4622877 -> 3984893 (-13.80%)
  Cycle count: 235094963186 -> 234483925430 (-0.26%); split: -0.42%, +0.16%
  Spill count: 73414 -> 73581 (+0.23%); split: -0.02%, +0.25%
  Fill count: 215090 -> 215627 (+0.25%); split: -0.02%, +0.27%
  Scratch Memory Size: 3520512 -> 3528704 (+0.23%); split: -0.12%, +0.35%

Some of stats show spilling changes which is telling of how our spill
code is not adequate. Some of the spilled values are probably being
respilled which shouldn't be the case.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32110>
2024-12-04 08:59:07 +00:00
Ian Romanick
2a57568ebd brw/build: Add scalar_group() helper
Some uses of the old pattern still exist. The use in brw_fs_nir.cpp is
deleted by commits !29884. The use in brw_lower_logical_sends.cpp seems
different, so I decided to keep it.

The next commit wants to use this.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
de45273307 brw/builder: Add new style ALU3 builder
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31834>
2024-10-25 20:31:45 +00:00
Caio Oliveira
f20df2984d intel/brw: Ensure BROADCAST() value respect register alignment
If we have a non-register-aligned source, MOV it to a new register
so that the invariant expected when generating SHADER_OPCODE_BROADCAST
is respected.

Added to ensure a later patch won't hit the `src.subnr == 0` assertion
in brw_broadcast() generation code.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31029>
2024-10-19 02:44:20 +00:00
Caio Oliveira
d97381efd8 intel/brw: Add fs_builder::BROADCAST() helper
Include in the helper which already take care of using exec_all() and
taking the first component of the result.  Both are expected by
SHADER_OPCODE_BROADCAST.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31029>
2024-10-19 02:44:20 +00:00
Caio Oliveira
b9787fcc80 intel/brw: Move emit_scan/emit_scan_step near its usage
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>
2024-10-11 06:40:29 +00:00
Caio Oliveira
e4f090d3a6 intel/brw: Remove special treatment for 2-src in emit() helper
For Gfx9+ no 2-src instructions need sources to fixed up.  Special
treatment remains for 3-src instructions.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30911>
2024-08-30 04:33:47 +00:00
Caio Oliveira
695f5314d6 intel/brw: Simplify fs_inst annotation
When INTEL_DEBUG=ann is also set, the disassembler would annotate the
output with either a string or the string verison of a NIR instruction.
This was done by keeping two pointers (but only using one at a time).

Change the code to print the instruction into a string instead of
keeping it pointer around (peg the string to the shader).  That way,
only one pointer is needed for annotations.  Because that serialization
is not free, only do that when the environment variable is set.

Since we are here, move the annotation string field to the end, moving
it to the least commonly used cacheline.  Further packing might allow
the entire fs_inst to fit in two cachelines.

For release builds, don't even add the debug annotation to the struct.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30822>
2024-08-28 03:59:50 +00:00
Caio Oliveira
2e2b83f72d intel/brw: Use CSE for LOAD_SUBGROUP_INVOCATION
Instead of emitting a single one at the top, and making reference to it,
emit the virtual instruction as needed and let CSE do its job.

Since load_subgroup_invocation now can appear not at the start of the
shader, use UNDEF in all cases to ensure that the liveness of the
destination doesn't extend to the first partial write done here (it was
being used only for SIMD > 8 before).

Note this option was considered in the past
6132992cdb but at the time dismissed.  The
difference now is that the lowering of the virtual instruction happens
earlier than the scheduling.

The motivation for this change is to allow passes other than the NIR
conversion to use this value.  The alternative of storing a `brw_reg` in
the shader (instead of NIR state) gets complicated by passes like
compact_vgrfs, that move VGRFs around (and update the instructions).
This and maybe other passes would have to care about the brw_reg.

Fossil-db numbers, TGL

```
*** Shaders only in 'after' results are ignored:
steam-native/shadow_of_the_tomb_raider/c683ea5067ee157d/fs.32/0, steam-native/shadow_of_the_tomb_raider/f4df450c3cef40b4/fs.32/0, steam-native/shadow_of_the_tomb_raider/94b708fb8e3d9597/fs.32/0, steam-native/shadow_of_the_tomb_raider/19d44c328edabd30/fs.32/0, steam-native/shadow_of_the_tomb_raider/8a7dcbd5a74a19bf/fs.32/0, and 366 more
from 4 apps: steam-dxvk/alan_wake, steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider

*** Shaders only in 'before' results are ignored:
steam-dxvk/octopath_traveler/aaa3d10acb726906/fs.32/0, steam-dxvk/batman_arkham_origins/e6872ae23569c35f/fs.32/0, steam-dxvk/octopath_traveler/fd33a99fa5c271a8/fs.32/0, steam-dxvk/octopath_traveler/9a077cdc16f24520/fs.32/0, steam-dxvk/batman_arkham_city_goty/fac7b438ad52f622/fs.32/0, and 12 more
from 4 apps: steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-dxvk/octopath_traveler, steam-native/shadow_of_the_tomb_raider

Totals:
Instrs: 149752381 -> 149751337 (-0.00%); split: -0.00%, +0.00%
Cycle count: 11553609349 -> 11549970294 (-0.03%); split: -0.06%, +0.03%
Spill count: 42763 -> 42764 (+0.00%); split: -0.01%, +0.01%
Fill count: 75650 -> 75651 (+0.00%); split: -0.00%, +0.01%
Max live registers: 31725096 -> 31671792 (-0.17%)
Max dispatch width: 5546008 -> 5551672 (+0.10%); split: +0.11%, -0.00%

Totals from 52574 (8.34% of 630441) affected shaders:
Instrs: 9535159 -> 9534115 (-0.01%); split: -0.03%, +0.02%
Cycle count: 1006627109 -> 1002988054 (-0.36%); split: -0.65%, +0.29%
Spill count: 11588 -> 11589 (+0.01%); split: -0.03%, +0.03%
Fill count: 21057 -> 21058 (+0.00%); split: -0.01%, +0.02%
Max live registers: 1992493 -> 1939189 (-2.68%)
Max dispatch width: 559696 -> 565360 (+1.01%); split: +1.06%, -0.05%
```

and DG2

```
*** Shaders only in 'after' results are ignored:
steam-native/shadow_of_the_tomb_raider/1f95a9d3db21df85/fs.32/0, steam-native/shadow_of_the_tomb_raider/56b87c4a46613a2a/fs.32/0, steam-native/shadow_of_the_tomb_raider/a74b4137f85dbbd3/fs.32/0, steam-native/shadow_of_the_tomb_raider/e07e38d3f48e8402/fs.32/0, steam-native/shadow_of_the_tomb_raider/206336789c48996c/fs.32/0, and 268 more
from 4 apps: steam-dxvk/alan_wake, steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider

*** Shaders only in 'before' results are ignored:
steam-native/shadow_of_the_tomb_raider/0420d7c3a2ea99ec/fs.32/0, steam-native/shadow_of_the_tomb_raider/2ff39f8bf7d24abb/fs.32/0, steam-native/shadow_of_the_tomb_raider/92d7be2824bd9659/fs.32/0, steam-native/shadow_of_the_tomb_raider/f09ca6d2ecf18015/fs.32/0, steam-native/shadow_of_the_tomb_raider/490f8ffd59e52949/fs.32/0, and 205 more
from 3 apps: steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider

Totals:
Instrs: 151597619 -> 151599914 (+0.00%); split: -0.00%, +0.00%
Subgroup size: 7699776 -> 7699784 (+0.00%)
Cycle count: 12738501989 -> 12739841170 (+0.01%); split: -0.01%, +0.02%
Spill count: 61283 -> 61274 (-0.01%)
Fill count: 119886 -> 119849 (-0.03%)
Max live registers: 31810432 -> 31758920 (-0.16%)
Max dispatch width: 5540128 -> 5541136 (+0.02%); split: +0.08%, -0.06%

Totals from 49286 (7.81% of 631231) affected shaders:
Instrs: 8607753 -> 8610048 (+0.03%); split: -0.01%, +0.04%
Subgroup size: 857752 -> 857760 (+0.00%)
Cycle count: 305939495 -> 307278676 (+0.44%); split: -0.28%, +0.72%
Spill count: 6339 -> 6330 (-0.14%)
Fill count: 12571 -> 12534 (-0.29%)
Max live registers: 1788346 -> 1736834 (-2.88%)
Max dispatch width: 510920 -> 511928 (+0.20%); split: +0.85%, -0.66%
```

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30489>
2024-08-08 18:20:49 +00:00
Caio Oliveira
a5cc8c4807 intel/brw: Move VARYING_PULL_CONSTANT_LOAD from fs_visitor to fs_builder
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
3670c24740 intel/brw: Replace uses of fs_reg with brw_reg
And remove the fs_reg alias.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>
2024-07-03 02:53:19 +00:00
Caio Oliveira
d00329e821 intel/brw: Replace some fs_reg constructors with functions
Create three helper functions for ATTR, UNIFORM and VGRF creation.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>
2024-07-03 02:53:18 +00:00
Kenneth Graunke
1e69ec3b8d intel/brw: Add a lower_csel pass and allow building it for all types
We can do CSEL on F, HF, *W, and *D on Gfx11+.  Gfx9 can only do F.

We can lower unsupported types to CMP+CSEL, allowing us to use CSEL
in the IR and not worry about the limitations.

Rework: (Sagar)
- Update validation pass for CSEL

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29316>
2024-07-01 19:06:31 +00:00
Ian Romanick
77ef241577 intel/brw/xe2+: Scale size_written by reg_unit for DPAS
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>
2024-06-25 14:17:47 -07:00
Kenneth Graunke
5cb15a6c67 intel/brw: Make bld.ADD(x, 0) emit no instructions and return x directly
There are a lot of places where we add 0 to an offset.  Avoiding
generating this can save us algebraic + copy_propagation later.

Cuts compile time in Borderlands 3 by -0.590631% +/- 0.170108% (n=25).

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29849>
2024-06-24 19:12:21 -07:00
Kenneth Graunke
068865ce81 intel/brw: Make an alu2 builder helper
Instead of replicating the whole thing in macros, just make an alu2()
function and use that in the wrappers.  It ought to get inlined anyway.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29849>
2024-06-24 19:12:19 -07:00
Kenneth Graunke
344d4ee9f0 intel/brw: Make VEC() perform a single write to its destination.
This gathers a number of sources into a contiguous vector register,
typically using LOAD_PAYLOAD.  However, it uses MOV for a single source.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
f04bb49465 intel/brw: Delete SAD2 and SADA2 opcodes
These were removed with Icelake.  While they technically still exist on
Skylake, which this compiler supports, we have never used these opcodes
in the 14 years we could have done so.  So just scrap them.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29665>
2024-06-10 16:47:50 -07:00
Francisco Jerez
6261f4d361 intel/brw/xe2+: Fix 64-bit subgroup scan intrinsics not to rely on SEL instructions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>
2024-05-15 17:16:51 +00:00
Sagar Ghuge
e32828f5fc intel/compiler: Fix destination type for CMP/CMPN
For CMP/CMPN, use src0 type if destination is null otherwise get the
src0 type register with destination register size.

This fixes dEQP-VK.glsl.builtin_var.frontfacing.* tests cases on Xe2+.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28679>
2024-05-06 21:46:18 +00:00
Kenneth Graunke
3c867bf2c7 intel/brw: Add a new VEC() helper.
This gathers a number of sources into a contiguous vector register.
Eventually, the plan is that it will use a MOV for a single source,
or LOAD_PAYLOAD for multiple sources.  For now, it emits a series of
MOVs to allow us to rewrite a bunch of existing code to use the new
helper, then change them all over at once later.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>
2024-04-30 17:16:42 -07:00
Kenneth Graunke
674e89953f intel/brw: Use new builder helpers that allocate a VGRF destination
With the previous commit, we now have new builder helpers that will
allocate a temporary destination for us.  So we can eliminate a lot
of the temporary naming and declarations, and build up expressions.

In a number of cases here, the code was confusingly mixing D-type
addresses with UD-immediates, or expecting a UD destination.  But the
underlying values should always be positive anyway.  To accomodate the
type inference restriction that the base types much match, we switch
these over to be purely UD calculations.  It's cleaner to do so anyway.

Compared to the old code, this may in some cases allocate additional
temporary registers for subexpressions.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>
2024-04-29 07:51:45 +00:00
Kenneth Graunke
4c2c49f7bc intel/brw: Add builder helpers that allocate temporary destinations
In many cases, we calculate an expression by generating a series of
instructions.  We'd either overwrite the same register repeatedly,
or call vgrf(BRW_TYPE_X) repeatedly to allocate temporaries for each
intermediate step.  In many cases, we overwrote the same register simply
because allocating and naming temporaries for each step was annoying.

This commit adds new builder helpers that will allocate a temporary
destination for you, using simple type interference: unary operations
use the source type, and binary operations require a matching base type
and return the largest of the two types.

The helpers return the destination register, allowing us to write in an
expression-tree style, chaining together builder operations to produce
whole values.  Sort of like nir_builder.  We still optionally will write
out the fs_inst pointer in case the caller wants to do things like set
predicates or saturation.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>
2024-04-29 07:51:45 +00:00
Kenneth Graunke
319ba85e10 intel/brw: Add builder helpers for math functions
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>
2024-04-29 07:51:45 +00:00
Kenneth Graunke
545bb8fb6f intel/brw: Replace type_sz and brw_reg_type_to_size with brw_type_size_*
Both of these helpers do the same thing.  We now have brw_type_size_bits
and brw_type_size_bytes and can use whichever makes sense in that place.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
c22f44ff07 intel/brw: Replace brw_reg_type_from_bit_size by brw_type_with_size
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
f523bfcf90 intel/brw: Reindent after shortening BRW_REGISTER_TYPE_* to BRW_TYPE_*
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
873fcdff38 intel/brw: Stop using long BRW_REGISTER_TYPE enum names
s/BRW_REGISTER_TYPE/BRW_TYPE/g

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
e637c63239 intel/brw: Make an fs_builder::SYNC helper
We always want a null destination, so this saves some typing.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
2024-04-16 02:14:49 +00:00
Ian Romanick
6d85f7129a intel/brw/xe2+: DPAS must be SIMD16 now
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>
2024-03-29 21:12:32 +00:00
Ian Romanick
671745b616 intel/fs: Don't allow 0 stride on MOV destination
Outside SIMD1 instructions, a destination stride of zero doesn't make
any sense. When such strides exist, they would be fixed by the FS
generator. Currently the only place that intentionally generates such a
stride is setup_barrier_message_payload_gfx125, and this commit changes
that.

The existence of a zero stride that won't really be a zero stride causes
a variety of problems with other optimization passes. Those passes don't
know that 0 actually means 1, and they make incorrect assumptions about
sizes written, etc.

The assertion helped catch many bugs in some other work in progress that
tries to store convergent values in SIMD8 registers regardless of the
dispatch width. That code would accidentally generate destination
strides of zero.

v2: Check stride differently depending on register file. Suggested by
Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28256>
2024-03-19 18:17:59 +00:00
Caio Oliveira
97759ef139 intel/brw: Remove typedefs from fs_builder
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27866>
2024-02-29 21:14:13 -08:00
Caio Oliveira
865ef36609 intel/brw: Remove brw_shader.h
Find a better home for its existing content.  Some functions are
now just static functions at the usage sites.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>
2024-02-29 19:28:06 +00:00
Caio Oliveira
5c93a0e125 intel/brw: Remove Gfx8- remaining opcodes
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
b6098676fa intel/brw: Remove Gfx8- code from builder
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:38 +00:00
Caio Oliveira
071e9f49f1 intel/brw: Remove F16TO32 and F32TO16 opcodes
These are done with MOVs and appropriate types in Gfx9+.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:38 +00:00
Ian Romanick
e666872c75 intel/compiler: Initial bits for DPAS instruction
v2: Add brw_ir_performance.cpp and brw_fs_generator.cpp changes. Fix
overlapping register allocation (via has_source_and_destination_hazard). Fix
incorrect destination register file encoding.

v3: Prevent lower_regioning from trying to "fix" DPAS sources.

v4: Add instruction latency information for scheduling and perf
estimates.

v5: Remove all mention of DPASW. Suggested by Curro and Caio. Update
the comment in fs_inst::has_source_and_destination_hazard. Suggested
by Caio.

v6: Add some comments near the src2 calculation in
fs_inst::size_read. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:16 -08:00
Caio Oliveira
38a42e5aa1 intel/compiler: Add ctor to fs_builder that just takes the shader
Uses the dispatch_width from the shader (fs_visitor).  This was not
possible before because the dispatch_width was not part of
backend_shader.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
2023-12-12 19:36:14 +00:00
Caio Oliveira
cf730adc58 intel/compiler: Make fs_builder include fs_visitor and not the other way
This will allow fs_builder have a reference to an fs_visitor (a
"fs_shader" really), instead of a reference to a backend_shader.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
2023-12-12 19:36:14 +00:00
Caio Oliveira
f5032c4d52 intel/compiler: Make fs_visitor not depend on fs_builder
At this point this is more a header dependency due to inline functions,
so shuffle them around.  The end goal is to allow fs_builder have a
reference to a fs_visitor (really a fs_shader).

Note the header is still included, a later patch will move the includes
to the call-sites.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
2023-12-12 19:36:14 +00:00
Caio Oliveira
21cf9323f0 intel/compiler: Add a few more helpers to fs_builder
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25216>
2023-11-30 20:58:05 +00:00
Francisco Jerez
150b3e87c8 intel/fs/xe2+: Round up fs_builder::vgrf() size calculation to HW register unit.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Caio Oliveira
26f6ea5c30 intel/compiler: Remove unused functions and declarations
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23539>
2023-06-09 20:09:51 +00:00
Lionel Landwerlin
3d0cc3f63b intel/fs: keep track of new resource_intel information
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
Kenneth Graunke
e7ea2aa46c intel/fs: Make bld.F16TO32 actually emit F16TO32 not F32TO16
Ahem, "add builder helpers that work on Gfx7"...now might actually work.
Too much copy and paste...

Fixes: 966995d911 ("intel/fs: Add builder helpers for F32TO16/F16TO32 that work on Gfx7.x")
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21974>
2023-03-17 09:01:18 +00:00
Kenneth Graunke
966995d911 intel/fs: Add builder helpers for F32TO16/F16TO32 that work on Gfx7.x
These take care of emitting the F32TO16/F16TO32 instructions on Gfx7.x
but otherwise just emit a type converting MOV on Gfx8+.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
2023-03-09 23:26:17 +00:00