Commit graph

4222 commits

Author SHA1 Message Date
Ian Romanick
20cce95ce5 brw/opt: Don't call brw_opt_copy_propagation before brw_lower_load_reg
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
On a 36c/72t Xeon system, performance of replaying
hogwarts_legacy.dx12vk-ultra.foz was improved 1.3% +/- 0.77% (n=10).

I picked MTL for the fossil-db results because it was the most negative.

shader-db:

All Intel platforms had fairly similar results. (Lunar Lake)
total instructions in shared programs: 16964217 -> 16964216 (<.01%)
instructions in affected programs: 51777 -> 51776 (<.01%)
helped: 20 / HURT: 27

total cycles in shared programs: 892934916 -> 893041912 (0.01%)
cycles in affected programs: 51245298 -> 51352294 (0.21%)
helped: 96 /HURT: 78

fossil-db:

All Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 233678547 -> 233678944 (+0.00%); split: -0.00%, +0.00%
Cycle count: 24398049850 -> 24400490877 (+0.01%); split: -0.01%, +0.02%
Max live registers: 42145052 -> 42145038 (-0.00%); split: -0.00%, +0.00%

Totals from 1141 (0.14% of 805934) affected shaders:
Instrs: 1546001 -> 1546398 (+0.03%); split: -0.01%, +0.03%
Cycle count: 1201746062 -> 1204187089 (+0.20%); split: -0.14%, +0.34%
Max live registers: 84247 -> 84233 (-0.02%); split: -0.03%, +0.01%

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
2025-04-04 06:45:02 +00:00
Ian Romanick
991a2f510b brw/sat: Eliminate non-defs saturate propagation
The intervening_saturating_copy test is removed. The defs version of the
pass does not handle this case. It should not occur often in practice
anyway. Copy propagation and brw_nir_opt_fsat should prevent this
scenario from happening.

No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 212677275 -> 212677278 (+0.00%)
Cycle count: 30466062848 -> 30466056040 (-0.00%)

Totals from 1 (0.00% of 706300) affected shaders:
Instrs: 1343 -> 1346 (+0.22%)
Cycle count: 411664 -> 404856 (-1.65%)

v2: Stop counting ip. The non-defs part of the pass was the only thing
that used it.

v3: Also delete "if (block != def->block) continue;" code. I noticed
this while working on some other changes to this function. It's the last
thing in the loop, so it's totally useless. Delete some other spurious
continues too.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v2]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
2025-04-04 06:45:02 +00:00
Ian Romanick
cc5a6a5ae8 brw/sat: Convert tests to use load_reg
This is in prepartion for a commit that removes the non-defs version of
the pass.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
2025-04-04 06:45:02 +00:00
Ian Romanick
2d13acf9d9 brw: Add passes to generate and lower load_reg
v2: Add support for WE_all instructions... this already just worked, so
I only had to delete the check and the FINISHME comment.

v3: Use logic more like def_analysis::update_for_reads to determine when
to not insert LOAD_REG instructions. Based on a suggestion by Ken.

v4: Eliminate "store" from all the names since STORE_REG does not exist
anymore. Fold insert_load_reg into brw_insert_load_reg. Elminate extra
call to s.def_analysis.require() after progress. Pull a loop-invariant
check out of the inst->srouces loop. Drop call to
brw_opt_split_virtual_grfs after lowering load_reg. All suggested by
Caio.

v5: Assert that LOAD_REG doesn't already exist in
brw_insert_load_reg. Update comment before fully_defines. Both
suggested by Caio.

v6: Don't explicitly special-case SHADER_OPCODE_MEMORY_STORE_LOGICAL.
Move the inst->dst.file != VGRF check earlier to avoid the loop over
sources. Both suggested by Ken. Move the call the brw_insert_load_reg
a little bit later, and explain why it's at that location. Suggested
by Caio.

v7: Many changes to the for-each-source loop in brw_insert_load_reg.
Removes incorrect multiplication of s.alloc.sizes with reg_unit. Adds
checks for matching SIMD size and NoMask in the search for pre-existing
LOAD_REG of same value.

v8: Add some unit tests. Suggested by Caio.

shader-db:

Lunar Lake
total instructions in shared programs: 16923237 -> 16921895 (<.01%)
instructions in affected programs: 450565 -> 449223 (-0.30%)
helped: 251 / HURT: 377

total cycles in shared programs: 910428418 -> 889920590 (-2.25%)
cycles in affected programs: 719248184 -> 698740356 (-2.85%)
helped: 9076 / HURT: 9082

total fills in shared programs: 2242 -> 2218 (-1.07%)
fills in affected programs: 116 -> 92 (-20.69%)
helped: 2 / HURT: 0

total sends in shared programs: 848635 -> 848421 (-0.03%)
sends in affected programs: 810 -> 596 (-26.42%)
helped: 10 / HURT: 0

LOST:   82
GAINED: 78

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19875784 -> 19871694 (-0.02%)
instructions in affected programs: 1050091 -> 1046001 (-0.39%)
helped: 251 / HURT: 2403

total cycles in shared programs: 905328238 -> 882446458 (-2.53%)
cycles in affected programs: 682736344 -> 659854564 (-3.35%)
helped: 7869 / HURT: 7911

total spills in shared programs: 5512 -> 5032 (-8.71%)
spills in affected programs: 1830 -> 1350 (-26.23%)
helped: 8 / HURT: 0

total fills in shared programs: 5648 -> 4782 (-15.33%)
fills in affected programs: 3312 -> 2446 (-26.15%)
helped: 8 / HURT: 0

total sends in shared programs: 1032942 -> 1032722 (-0.02%)
sends in affected programs: 572 -> 352 (-38.46%)
helped: 10 / HURT: 0

LOST:   138
GAINED: 53

Tiger Lake
total instructions in shared programs: 19711930 -> 19715591 (0.02%)
instructions in affected programs: 1040623 -> 1044284 (0.35%)
helped: 317 / HURT: 2474

total cycles in shared programs: 862988990 -> 860573870 (-0.28%)
cycles in affected programs: 612392461 -> 609977341 (-0.39%)
helped: 7447 / HURT: 7686

total sends in shared programs: 1034763 -> 1034555 (-0.02%)
sends in affected programs: 784 -> 576 (-26.53%)
helped: 8 / HURT: 0

LOST:   56
GAINED: 143

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20545461 -> 20545220 (<.01%)
instructions in affected programs: 422405 -> 422164 (-0.06%)
helped: 180 / HURT: 459

total cycles in shared programs: 872697345 -> 866874523 (-0.67%)
cycles in affected programs: 573117917 -> 567295095 (-1.02%)
helped: 6783 / HURT: 6980

total spills in shared programs: 4335 -> 4336 (0.02%)
spills in affected programs: 90 -> 91 (1.11%)
helped: 1 / HURT: 2

total fills in shared programs: 4194 -> 4196 (0.05%)
fills in affected programs: 463 -> 465 (0.43%)
helped: 1 / HURT: 2

total sends in shared programs: 1079446 -> 1079238 (-0.02%)
sends in affected programs: 784 -> 576 (-26.53%)
helped: 8 / HURT: 0

LOST:   117
GAINED: 37

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 209708136 -> 209695617 (-0.01%); split: -0.02%, +0.01%
Send messages: 10927753 -> 10927640 (-0.00%)
Cycle count: 30540172048 -> 30427084732 (-0.37%); split: -0.99%, +0.62%
Spill count: 511621 -> 510932 (-0.13%); split: -0.22%, +0.08%
Fill count: 621166 -> 618440 (-0.44%); split: -0.56%, +0.12%
Scratch Memory Size: 35574784 -> 35648512 (+0.21%); split: -0.06%, +0.26%
Max live registers: 65453860 -> 65453140 (-0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 75374990 -> 35195764 (-53.31%)

Totals from 503284 (71.25% of 706391) affected shaders:
Instrs: 180203778 -> 180191259 (-0.01%); split: -0.02%, +0.01%
Send messages: 9699732 -> 9699619 (-0.00%)
Cycle count: 30080349592 -> 29967262276 (-0.38%); split: -1.01%, +0.63%
Spill count: 511584 -> 510895 (-0.13%); split: -0.22%, +0.08%
Fill count: 621120 -> 618394 (-0.44%); split: -0.56%, +0.12%
Scratch Memory Size: 35443712 -> 35517440 (+0.21%); split: -0.06%, +0.27%
Max live registers: 52566092 -> 52565372 (-0.00%); split: -0.01%, +0.00%
Non SSA regs after NIR: 70110949 -> 29931723 (-57.31%)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
2025-04-04 06:45:02 +00:00
Ian Romanick
8b2be206f3 brw/algebraic: Constant folding for BROADCAST and SHUFFLE
This prevents assertion failures in brw_eu_emit in a later commit in
this MR. Even though they have not been previously observed, these
assertion failures could happen even without that commit.

No shader-db or fossil-db changes on any Intel platform.

Fixes: 04e1783278 ("brw: Call brw_fs_opt_algebraic less often")

v2: Add SHUFFLE. Suggested by Ken. Fixed indentation.

v3: Update BROADCAST exec_size after rebasing on "brw/build: Use SIMD8
temporaries in emit_uniformize".

v4: Explain why munging the exec_size is correct.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
2025-04-04 06:45:02 +00:00
Ian Romanick
1b997c7bcc brw/coalesce: Prepare brw_opt_register_coalesce for load_reg
v2: Explain the problematic situation a little better in the
comment. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
2025-04-04 06:45:02 +00:00
Ian Romanick
15637334ce brw/copy: Prepare copy_propagation for load_reg
The changes to try_copy_propagate will be removed later in the series.

v2: Fix up some comments to note that offset != 0 is allowed only when
stride == 0. Apply same offset=0 restriction in try_copy_propagate_def
too. Allow copy propagation if the source is either a def or
UNIFORM. Don't copy prop a load_reg through a non-def value.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
2025-04-04 06:45:02 +00:00
Ian Romanick
cfc50390fb brw: Add basic infrastructure for load_reg pseudo op
load_reg is something like load_payload except it has a single
source. It copies the entire source to the destination. Its purpose is
to convert a non-SSA VGRF into an SSA value. This copy is marked as
volatile so that it will act as a scheduling barrier.

v2: Fix some typos in the commit message. Eliminate the
brw_builder::LOAD_REG overload that returns a brw_inst*. This is
unlikely to ever be used. Add some checks to brw_validate. All
suggested by Caio.

v3: Force the source and destination types of the LOAD_REG to by
integer. This will (eventually) simplify the creating of unit tests for
the pass that adds LOAD_REG instructions.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
2025-04-04 06:45:02 +00:00
Ian Romanick
b9656d51c0 brw/opt: Move non-SSA register accounting after first brw_opt_split_virtual_grfs
v2: Move to immediately before the main optimization loop. Most
importantly, this is after the first call to DCE.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Non SSA regs after NIR: 237045283 -> 100183460 (-57.74%); split: -58.12%, +0.39%

Totals from 701423 (99.26% of 706657) affected shaders:
Non SSA regs after NIR: 236868848 -> 100007025 (-57.78%); split: -58.17%, +0.39%

Suggested-by: Ken
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
2025-04-04 06:45:02 +00:00
Caleb Callaway
5ad00bae8b intel/compiler: fix lingering i965 references
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34351>
2025-04-03 03:17:25 +00:00
Ian Romanick
e210b79ce3 brw/nir: Lower fsign again after last call to brw_nir_optimize
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
No shader-db or fossil-db changes on any Intel platform.

Fixes: 13332c23 ("intel/brw: Unconditionally run optimizations after nir_opt_uniform_subgroup")
Closes: #12888
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34251>
2025-04-02 01:59:49 +00:00
Ian Romanick
ca95cb8178 brw: Fix typo in comment
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34251>
2025-04-02 01:59:49 +00:00
irql-notlessorequal
255166a349 elk: always write the VUE header
ELK equivalent of !34211, also required to avoid potential rendering errors with hasvk.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34298>
2025-03-31 16:56:13 +00:00
irql-notlessorequal
fe7e0fd4f1 elk: ensure VUE header writes in HS/DS/GS stages
ELK equivalent of !34041, required to avoid potential rendering errors with VK_KHR_maintenance5

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34298>
2025-03-31 16:56:13 +00:00
Lionel Landwerlin
4346210ae6 brw: move texture offset packing to NIR
That way we can deal with upcoming non constant values for
VK_KHR_maintenance8.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33138>
2025-03-29 02:15:18 +00:00
Lionel Landwerlin
67ae49dede intel: move lower_texture to brw
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33138>
2025-03-29 02:15:18 +00:00
Lionel Landwerlin
86773b2ba6 brw: don't lower tg4 offsets without LOD
The problem this fixes is currently hidden because of the order in
which we run nir_lower_tex & intel_nir_lower_texture. The issue is
that nir_lower_tex removes the LOD source in some cases and the second
run of nir_lower_tex can add it back.

This is also only needed on Gfx12.5+ if the LOD is present.

Finally move all of the texture lowering to the postprocess phase. No
need to run this multiple times.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33138>
2025-03-29 02:15:18 +00:00
Lionel Landwerlin
b87dccc64c elk: stop using intel_nir_lower_texture
It's not doing anything.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33138>
2025-03-29 02:15:18 +00:00
Caio Oliveira
63224f64cc brw: Remove adjust_block_ips and brw_inst::remove() with defer
Now that the brw_ip_ranges analysis is being used, there's no
need to track start_ip/end_ips in the blocks as they are mutate.  And
also no need to call adjust_block_ips at the end of some passes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34012>
2025-03-29 00:25:51 +00:00
Caio Oliveira
8057cfc49d brw: Use brw_ip_ranges in liveness analysis
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34012>
2025-03-29 00:25:51 +00:00
Caio Oliveira
a6b0783375 brw: Use brw_ip_ranges in scheduling / regalloc
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34012>
2025-03-29 00:25:51 +00:00
Caio Oliveira
3659d36087 brw: Use brw_ip_ranges in passes
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34012>
2025-03-29 00:25:50 +00:00
Caio Oliveira
10660f5adf brw: Add analysis for block IP ranges
Calculate the IP ranges of the shader as an analysis pass.  This will
later replace the existing tracking of start_ip/end_ip as the blocks are
changed (and the defer/adjust scheme to avoid too much churn when that
happen).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34012>
2025-03-29 00:25:50 +00:00
Caio Oliveira
fd6045cca9 brw: Track total_instructions in a shader
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34012>
2025-03-29 00:25:50 +00:00
Caio Oliveira
7224b653b5 brw: Use block's num_instructions in scoreboard tests
Stop using the start_ip / end_ip, these are not really important for
those tests.  What the test care was the number of instructions in the
block to check for changes and ensure we can peek at them by index.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34012>
2025-03-29 00:25:50 +00:00
Caio Oliveira
1139ede508 brw: Track num_instructions in a block
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34012>
2025-03-29 00:25:50 +00:00
Caio Oliveira
abe8d35cb8 brw: Remove brw_cfg::dump()
It was used by the pass tests to verify output with TEST_DEBUG=1,
replace it with brw_print_instructions().

The output is slightly different (not printing IP, not reordering the
blocks), we can add those features as we need, but given the usage was
already very reduced, don't bother with that until need arises.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34012>
2025-03-29 00:25:50 +00:00
Kenneth Graunke
51c67ad7cf brw: Avoid regioning restrictions for u2u16/i2i16 narrowing conversions
Cuts 0.83% of instructions on Alchemist in affected fossil-db shaders
(nearly all of which are in parallel-rdp).

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31833>
2025-03-28 13:40:07 +00:00
Kenneth Graunke
86f8b8860e brw: Use a smaller type for masked sub-32-bit shift values
Cuts 0.14% of instructions on Alchemist in affected fossil-db shaders
(all of which are in parallel-rdp).

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31833>
2025-03-28 13:40:07 +00:00
Kenneth Graunke
2e108afb8c brw: Skip unnecessary UNDEFs for comparisons
For example, SIMD16 W/UW fills an entire REG_SIZE so UNDEF isn't needed.

No change in fossil-db.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31833>
2025-03-28 13:40:07 +00:00
Kenneth Graunke
771e65b0db brw: Emit UNDEF as needed in SSA-style builder helpers
Should prevent regressions in a future commit.
fossil-db does show small changes, but it ends up a wash at 0.0%.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31833>
2025-03-28 13:40:07 +00:00
Kenneth Graunke
b89e269a46 brw: Make a helper to emit UNDEF for temporaries containing small types
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31833>
2025-03-28 13:40:07 +00:00
Sagar Ghuge
191d1e7345 intel/compiler: Don't lower 64bit data memory access on LSC
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34189>
2025-03-28 03:07:56 +00:00
Lionel Landwerlin
4db4bd1d04 brw: always write the VUE header
In 35df3925ca ("brw: ensure VUE header writes in HS/DS/GS stages") I
misread the PRMs and thought that the VF would initialize the header.

What actually happens is that the VF does not write valid values in
there and the PRMs explicitly say that the VS shader should overwrite
whatever is in there.

We could avoid writing the header in some cases when no HW is going to
read back the header. For example with rendering disables through
3DSTATE_STREAMOUT::RenderingDisable. But those cases are dynamic and
the compiler is not able to tell. So just always write the header.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 35df3925ca ("brw: ensure VUE header writes in HS/DS/GS stages")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12880
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34211>
2025-03-27 07:42:23 +00:00
Caio Oliveira
72aefea0a0 brw: Fix disassembler trying to decode 3src_hstride in Gfx9
This field is not encoded for Gfx9, so use the fixed value
that makes sense for that platform.

Fixes: 9dfff2cb14 ("brw: Allow generating destination with stride 2 in 3-src instructions")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12881
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34197>
2025-03-26 18:12:46 +00:00
Caio Oliveira
e384ccde28 brw: Expand EU validation for DPAS
Allow BFloat16 types when supported and allow destination/accumulator to
match the other source types in Gfx20+.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34035>
2025-03-25 07:38:08 +00:00
Caio Oliveira
6cec413a78 brw: Add EU assembler support for bfloat16
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33664>
2025-03-25 05:23:37 +00:00
Caio Oliveira
e37b707bd0 brw: Consider bfloat16 in scoreboard
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33664>
2025-03-25 05:23:37 +00:00
Caio Oliveira
62323a934b brw: Add BRW_TYPE_BF validation
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33664>
2025-03-25 05:23:37 +00:00
Caio Oliveira
9916cc1050 brw: Add BRW_TYPE_BF for bfloat16
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33664>
2025-03-25 05:23:37 +00:00
Caio Oliveira
d1f4fb8eee brw: Make some integer check more explicit
Use the positive ("is int?") check when applicable.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33664>
2025-03-25 05:23:37 +00:00
Caio Oliveira
c3d2ba6973 brw: Remove prefix gfx10 from enum types
The values already use BRW, make it consistent.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33664>
2025-03-25 05:23:37 +00:00
Caio Oliveira
9dfff2cb14 brw: Allow generating destination with stride 2 in 3-src instructions
Will be useful for testing BFloat16 in later patches.  No change
expected to the compiler itself.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33664>
2025-03-25 05:23:37 +00:00
Caio Oliveira
676b874ca9 brw: Fix decoding of 3-src destination stride in EU validation
Fixes: f1036da345 ("intel/brw: Add vstride/width/hstride to brw_hw_decoded_inst")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33664>
2025-03-25 05:23:37 +00:00
Caio Oliveira
89a87fab66 brw: Remove extra SHADER_OPCODE_FLOW emitted during NIR conversion
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The DO() helper already emits a FLOW.

Fixes: d2c39b1779 ("intel/brw: Always have a (non-DO) block after a DO in the CFG")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33954>
2025-03-25 02:05:26 +00:00
Caio Oliveira
c01655370d brw: Add assembler support for DPAS
Allow us to parse instructions in a form we currently generate

```
dpas.8x8(8)     g55<1>F         g47<1,1,0>F     g31<1,1,0>HF    g39<1,1,0>HF { align1 WE_all 1Q $4 };
```

Regions are not really needed, but this will be handled in a later patch
(that will also stop printing the regions).

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34031>
2025-03-25 01:40:02 +00:00
Connor Abbott
7a55e13939 nir, compiler: Rename needs_quad_helper_invocations
This currently treats coarse and fine derivatives the same, but Qualcomm
needs to know whether just coarse derivatives are used or fine
derivatives/quad ops are also used. Rename this to
needs_coarse_quad_helper_invocations make clear the difference from the
new field, needs_full_quad_helper_invocations.

Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Fixes: 264d8a6766 ("ir3: Set need_full_quad depending on info.fs.require_full_quads")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33862>
2025-03-14 21:55:57 +00:00
Matt Turner
ed42dc56f5 intel/compiler: Use correct enum type
Fixes: ce7208c3ee ("brw: add support for texel address lowering")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34014>
2025-03-13 20:11:10 +00:00
Matt Turner
d5dcc6a5c4 intel/compiler: Add missing breaks
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34014>
2025-03-13 20:11:10 +00:00
Matt Turner
0a63d629fe intel/compiler: Use unreachable instead of assert(!"...")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34014>
2025-03-13 20:11:10 +00:00