Commit graph

735 commits

Author SHA1 Message Date
Ian Romanick
7eab2cb67e brw/nir: Treat load_workgroup_id as convergent
v2: Fix for Xe2.

shader-db:

Lunar Lake Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown)
total instructions in shared programs: 18096526 -> 18096500 (<.01%)
instructions in affected programs: 6759 -> 6733 (-0.38%)
helped: 9 / HURT: 3

total cycles in shared programs: 921727804 -> 921841300 (0.01%)
cycles in affected programs: 110049730 -> 110163226 (0.10%)
helped: 90 / HURT: 372

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20496591 -> 20496402 (<.01%)
instructions in affected programs: 48757 -> 48568 (-0.39%)
helped: 25 / HURT: 8

total cycles in shared programs: 875253948 -> 875237902 (<.01%)
cycles in affected programs: 56760140 -> 56744094 (-0.03%)
helped: 363 / HURT: 34

total spills in shared programs: 4555 -> 4546 (-0.20%)
spills in affected programs: 174 -> 165 (-5.17%)
helped: 2 / HURT: 0

total fills in shared programs: 5243 -> 5224 (-0.36%)
fills in affected programs: 382 -> 363 (-4.97%)
helped: 2 / HURT: 0

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 141811577 -> 141811551 (-0.00%); split: -0.00%, +0.00%
Cycle count: 22173792370 -> 22183128332 (+0.04%); split: -0.00%, +0.04%
Max live registers: 48053498 -> 48053415 (-0.00%)

Totals from 3911 (0.71% of 551443) affected shaders:
Instrs: 2164804 -> 2164778 (-0.00%); split: -0.00%, +0.00%
Cycle count: 2404062476 -> 2413398438 (+0.39%); split: -0.02%, +0.41%
Max live registers: 413583 -> 413500 (-0.02%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
6fab1b77c2 brw/nir: Treat some load_uniform as convergent
No shader-db changes on any Intel platform.

v2: Fix for Xe2.

v3: Rework the way that we determine that an intrinsic can actually be
convergent. This will now depend on whether or not the important
sources have previously be determined to be convergent. Fixes
intermitent failures in some test cases (including
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.push_constant_float_16_to_32.scalar_frag).

v4: s/the it/it/ in a comment. Noticed by Ken.

fossil-db:

No fossil-db changes on Lunar Lake.

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 152743449 -> 152743161 (-0.00%)
Cycle count: 17399179660 -> 17399193488 (+0.00%)

Totals from 144 (0.02% of 633314) affected shaders:
Instrs: 5936 -> 5648 (-4.85%)
Cycle count: 51616 -> 65444 (+26.79%)

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 150646195 -> 150645907 (-0.00%)
Cycle count: 15618427818 -> 15618428942 (+0.00%)

Totals from 144 (0.02% of 632567) affected shaders:
Instrs: 6218 -> 5930 (-4.63%)
Cycle count: 39968 -> 41092 (+2.81%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
341e5117ec brw/nir: Treat load_const as convergent
opt_combine_constants goes to great effort to pack 8 constants into a
single register, this can't have much effect.

There is a lot of fossil-db variation among platforms, but the results
are generally positive.

v2: Fix for Xe2.

shader-db:

Lunar Lake
total instructions in shared programs: 18095100 -> 18092845 (-0.01%)
instructions in affected programs: 158931 -> 156676 (-1.42%)
helped: 423 / HURT: 0

total cycles in shared programs: 921523326 -> 921522784 (<.01%)
cycles in affected programs: 7522774 -> 7522232 (<.01%)
helped: 225 / HURT: 228

LOST:   1
GAINED: 7

Meteor Lake and all older Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19820211 -> 19820303 (<.01%)
instructions in affected programs: 53087 -> 53179 (0.17%)
helped: 135 / HURT: 1

total cycles in shared programs: 906380523 -> 906383031 (<.01%)
cycles in affected programs: 1402315 -> 1404823 (0.18%)
helped: 156 / HURT: 100

LOST:   1
GAINED: 16

fossil-db:

Lunar Lake
Totals:
Instrs: 141876801 -> 141783010 (-0.07%); split: -0.07%, +0.00%
Subgroup size: 10994624 -> 10994704 (+0.00%)
Cycle count: 22173441950 -> 22172949188 (-0.00%); split: -0.01%, +0.01%
Spill count: 69850 -> 69890 (+0.06%); split: -0.00%, +0.06%
Fill count: 129285 -> 128877 (-0.32%)
Max live registers: 48047900 -> 48043650 (-0.01%); split: -0.01%, +0.00%

Totals from 29837 (5.41% of 551396) affected shaders:
Instrs: 7842512 -> 7748721 (-1.20%); split: -1.23%, +0.03%
Subgroup size: 940320 -> 940400 (+0.01%)
Cycle count: 3444846368 -> 3444353606 (-0.01%); split: -0.09%, +0.08%
Spill count: 23358 -> 23398 (+0.17%); split: -0.01%, +0.18%
Fill count: 52296 -> 51888 (-0.78%)
Max live registers: 3183481 -> 3179231 (-0.13%); split: -0.16%, +0.03%

Meteor Lake
Totals:
Instrs: 152709353 -> 152666543 (-0.03%); split: -0.03%, +0.00%
Cycle count: 17397176906 -> 17397668904 (+0.00%); split: -0.00%, +0.01%
Fill count: 147896 -> 147893 (-0.00%)
Max live registers: 31862891 -> 31861888 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5559664 -> 5561776 (+0.04%); split: +0.08%, -0.04%

Totals from 20913 (3.30% of 633046) affected shaders:
Instrs: 6676676 -> 6633866 (-0.64%); split: -0.64%, +0.00%
Cycle count: 1498330125 -> 1498822123 (+0.03%); split: -0.06%, +0.09%
Fill count: 41010 -> 41007 (-0.01%)
Max live registers: 1799295 -> 1798292 (-0.06%); split: -0.06%, +0.00%
Max dispatch width: 12880 -> 14992 (+16.40%); split: +33.29%, -16.89%

DG2 and Tiger Lake had similar results. (DG2 shown)
Totals:
Instrs: 152730878 -> 152688139 (-0.03%); split: -0.03%, +0.00%
Cycle count: 17394835605 -> 17394179808 (-0.00%); split: -0.01%, +0.00%
Max live registers: 31862843 -> 31861840 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5559664 -> 5561776 (+0.04%); split: +0.08%, -0.04%

Totals from 20912 (3.30% of 633046) affected shaders:
Instrs: 6563021 -> 6520282 (-0.65%); split: -0.65%, +0.00%
Cycle count: 1201999616 -> 1201343819 (-0.05%); split: -0.08%, +0.03%
Max live registers: 1798392 -> 1797389 (-0.06%); split: -0.06%, +0.00%
Max dispatch width: 12872 -> 14984 (+16.41%); split: +33.31%, -16.90%

Ice Lake
Totals:
Instrs: 151914872 -> 151868108 (-0.03%)
Cycle count: 15262958696 -> 15262665082 (-0.00%); split: -0.00%, +0.00%
Max live registers: 32194225 -> 32193192 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5650880 -> 5650608 (-0.00%); split: +0.02%, -0.03%

Totals from 22192 (3.48% of 637223) affected shaders:
Instrs: 6419739 -> 6372975 (-0.73%)
Cycle count: 184733818 -> 184440204 (-0.16%); split: -0.36%, +0.20%
Max live registers: 1989950 -> 1988917 (-0.05%); split: -0.05%, +0.00%
Max dispatch width: 5744 -> 5472 (-4.74%); split: +23.40%, -28.13%

Skylake
Totals:
Instrs: 141027379 -> 140811741 (-0.15%)
Cycle count: 14817704293 -> 14817418611 (-0.00%); split: -0.01%, +0.01%
Max live registers: 31628796 -> 31627791 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5535176 -> 5539880 (+0.08%); split: +0.14%, -0.06%

Totals from 22218 (3.53% of 628840) affected shaders:
Instrs: 5944856 -> 5729218 (-3.63%)
Cycle count: 182845101 -> 182559419 (-0.16%); split: -0.60%, +0.44%
Max live registers: 1974576 -> 1973571 (-0.05%); split: -0.07%, +0.02%
Max dispatch width: 16912 -> 21616 (+27.81%); split: +46.93%, -19.11%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00
Ian Romanick
5ea9ed4798 brw/nir: Prepare try_rebuild_source for scalar values
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00
Ian Romanick
d5d7ae22ae brw/nir: Fix up handling of sources that might be convergent vectors
Sources that are scalars (almost all source) and convergent generally
want <0,1,0> source stride. Sources that are vectors (e.g., texture
coordinates, SSBO write data, etc.) and convergent want no extra strides
applied. In nearly all cases LOAD_PAYLOAD lowering will do the right
thing.

v2: Use VEC in emit_pixel_interpolater_send. Suggested by Ken.

v3: With the elimination of offset_to_component(), offset() may not
convert an is_scalar source to have a zero stride. Explicitly do this
in get_nir_src and prepare_alu_destination_and_sources.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00
Caio Oliveira
93dfe504f2 intel/brw: Add SHADER_OPCODE_READ_FROM_CHANNEL and LIVE_CHANNEL
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32412>
2024-12-14 11:38:14 -08:00
Paulo Zanoni
0dc2a5808e brw: don't forget the base when emitting SHADER_OPCODE_MOV_RELOC_IMM
The last argument seems to be used as brw_shader_reloc::delta (from
brw_add_reloc), and we're unconditionally setting it to 0 here, while
the other place where we handle nir_intrinsic_load_reloc_const_intel
seems to be setting the base appropriately.

I found this by inspection while debugging a bug related to this code,
so I'm not aware of any workloads that get improved by this patch.

Related patches:
 - ecbec25e84 ("intel/nir: add reloc delta to load_reloc_const_intel intrinsic")
 - 99047451c9 ("intel/fs: add plumbing for embedded samplers")

Fixes: ecbec25e84 ("intel/nir: add reloc delta to load_reloc_const_intel intrinsic")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32531>
2024-12-09 15:45:49 +00:00
Sagar Ghuge
9afb0480c4 intel/compiler: Extend nir_intrinsic_load_topology_id_intel for xe3
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32426>
2024-12-04 19:20:51 +00:00
Kenneth Graunke
01680a66a9 brw: Simplify choose_oword_block_size_dwords()
Just calculate the block size using util_logbase2() - it's simpler.

Also drop the name "oword" as this refers to legacy HDC messages,
rather than the newer LSC "vector size" field.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
2024-12-03 02:02:33 +00:00
Kenneth Graunke
e703ff5e02 brw: Only consider components read for UBO loads
This will matter more with overfetching, where we may suggest loading
additional data that we don't actually need for vectorization purposes.

We want to make sure that push ranges have the data we actually need;
any extra padding is irrelevant.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
2024-12-03 02:02:33 +00:00
Kenneth Graunke
8c795af0b8 brw: Drop a few crocus references in comments
crocus no longer uses brw.  It uses elk.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
2024-12-03 02:02:32 +00:00
Lionel Landwerlin
ba3ff8b3bb brw: move barycentric_mode enum to intel_shader_enums.h
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32329>
2024-11-26 13:05:30 +00:00
Lionel Landwerlin
bfcb9bf276 brw: rename brw_sometimes to intel_sometimes
Moving it to intel_shader_enums.h

The plan is to make it visible to OpenCL shaders.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32329>
2024-11-26 13:05:30 +00:00
Caio Oliveira
8474dc853d intel/brw: Add SHADER_OPCODE_QUAD_SWAP
For the horizontal, vertical and diagonal variants.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31053>
2024-11-22 00:27:01 +00:00
Caio Oliveira
2bd7592b0b intel/brw: Add SHADER_OPCODE_BALLOT
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31052>
2024-11-21 19:32:59 +00:00
Kenneth Graunke
5848035443 brw: Fix try_rebuild_source's ult32/ushr handling to use unsigned types
We were accidentally doing a signed integer comparison here for ult32,
or a sign-extending shift for ushr.

One notable bit of fallout was that load_global_uniform_block_intel
address calculations broke on platforms that don't have native 64-bit
integer support, as the iadd64 lowering for "do I need to carry?" was
using ult32...and performing the wrong comparison.  We spotted this in
Borderlands 3 on Alchemist once we turned on other optimizations.

Thanks to Lionel Landwerlin for helping spot the problem!

Fixes: c7b312ad45 ("brw: factor out source extraction for rematerialization")
Fixes: 339630ab05 ("brw: enable A64 loads source rematerialization")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31995>
2024-11-18 12:55:47 +00:00
Ian Romanick
2a57568ebd brw/build: Add scalar_group() helper
Some uses of the old pattern still exist. The use in brw_fs_nir.cpp is
deleted by commits !29884. The use in brw_lower_logical_sends.cpp seems
different, so I decided to keep it.

The next commit wants to use this.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Caio Oliveira
019770f026 intel/brw: Add SHADER_OPCODE_VOTE_*
Add opcodes for VOTE_ALL, VOTE_ANY and VOTE_EQUAL.  The first two
are also used for the quad variants.  Move their lowering from
NIR conversion to brw_lower_subgroup_ops.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31029>
2024-10-19 02:44:20 +00:00
Caio Oliveira
d97381efd8 intel/brw: Add fs_builder::BROADCAST() helper
Include in the helper which already take care of using exec_all() and
taking the first component of the result.  Both are expected by
SHADER_OPCODE_BROADCAST.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31029>
2024-10-19 02:44:20 +00:00
Lionel Landwerlin
97b17aa0b1 brw/nir: rework inline_data_intel to work with compute
This intrinsic was initially dedicated to mesh/task shaders, but the
mechanism it exposes also exists in the compute shaders on Gfx12.5+.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>
2024-10-17 19:35:59 +00:00
Lionel Landwerlin
b2c5ca0ade brw: remove rebuild single element special case
No shader-db difference on DG2.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>
2024-10-17 19:35:59 +00:00
Lionel Landwerlin
19eb601cfc brw: avoid clashing nested loop indices
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>
2024-10-17 19:35:59 +00:00
Kenneth Graunke
dea61b7399 intel/brw: Fix register and builder size in emit_barrier() for Xe2
We were manually allocating 1 REG_SIZE for the barrier payload, which is
only half a register on Xe2.  This should eventually get allocated to a
whole register anyway, but it's awkward in the meantime.  Also, we were
zero-initializing the header using group(8, 0) which only initialized
half the register.  The rest of the fields are Reserved MBZ, so they're
likely unused and unread anyway - but it's better to zero-initialize
them so we don't get random undefined, miserable-to-debug behavior.

Backport-to: 24.2
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>
2024-10-15 18:14:37 +00:00
Kenneth Graunke
7c9eb8b289 intel/brw: Make a ubld temporary in emit_barrier()
Saves typing .exec_all() in a lot of places.

Backport-to: 24.2
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>
2024-10-15 18:14:37 +00:00
Kenneth Graunke
a9d9488788 intel/brw: Delete Gfx7-8 code from emit_barrier()
Those are supported by elk, not brw.

Backport-to: 24.2
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>
2024-10-15 18:14:37 +00:00
Kenneth Graunke
c747c1e1f4 intel/brw: Fix spill/fill count for load/store_scratch in SIMD32
Honestly, I don't know what I was thinking - we are emitting a single
spill/fill message here, but were counting it as 2 spill/fills in SIMD32
shaders.  So our eventual shader stat reporting would subtract the
number of spills and fills from send_count, and get a negative number,
wrapping around to just shy of UINT32_MAX.  That's way too many sends.

This is especially noticable on Xe2 which often uses SIMD32 shaders.

Backport-to: 24.2
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>
2024-10-15 18:14:37 +00:00
Caio Oliveira
0ba1159b0a intel/brw: Add SHADER_OPCODE_*_SCAN
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>
2024-10-11 06:40:29 +00:00
Caio Oliveira
9537b62759 intel/brw: Add SHADER_OPCODE_REDUCE
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>
2024-10-11 06:40:29 +00:00
Caio Oliveira
affa7567c2 intel/brw: Add phases to backend
The general idea is to be able to validate that certain instructions
were lowered and certain restrictions were already handled.  Passes can
now assert their expectations, i.e. if a pass is mean to run after
certain lowerings or not.

The actual phases are a initial stab and as we re-organized the passes,
we may remove/add phases.

This commit just add some phase steps, later commits will make use of
them.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>
2024-10-11 06:40:29 +00:00
Sviatoslav Peleshko
57344052b6 intel/brw: Don't apply discard_if condition opt if it can change results
We can't just always negate the alu instruction's cmod, because negating
it can produce different results when the argument is NaN float. We can
still do that if the condition is == or !=.

Fixes: 0ba9497e ("intel/fs: Improve discard_if code generation")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11800
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31042>
2024-09-27 11:52:27 +00:00
Lionel Landwerlin
eeb5f6e8c8 brw: make sampler message emission more generic
We can generalize the simd8-16bits case by just rounding to a physical
register.

We also take the opportunity to limit the register allocation to a
single physical GRF for the residency data.

Signed-off-by: Lionel Landwerlin <llandwerlin@gmail.com>
Fixes: 0116430d39 ("intel/brw: Handle 16-bit sampler return payloads")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31307>
2024-09-25 10:22:40 +00:00
Lionel Landwerlin
45377dc5c4 brw: fix vecN rebuilds
When loading a 64bit address from the push constants, we'll load a
vec2, so we need to allocate 2 GRFs and MOV each component.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11831
Fixes: 339630ab05 ("brw: enable A64 loads source rematerialization")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31010>
2024-09-17 14:22:23 +00:00
Lionel Landwerlin
c16b27f66f brw: use a builder of the size of the physical register for uniforms
Should avoid any partial write non-sense on Xe2+.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 339630ab05 ("brw: enable A64 loads source rematerialization")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31010>
2024-09-17 14:22:23 +00:00
Kenneth Graunke
7090578c35 intel/brw: Switch load_ubo_uniform_block_intel over to memory intrinsics
While there are many cases that turn into the *_PULL_CONSTANT_LOAD ops
or push constants, this one piece was emitting surface block loads.
Switch it over to use the new intrinsics to delete a bunch of code.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
b55f77161d intel/brw: Switch to emitting MEMORY_*_LOGICAL opcodes
We introduce a new fs_nir_emit_memory_access() helper that can handle
image, bindless image, SSBO, shared, global, and scratch memory, and
handles loads, stores, atomics, and block loads.  It translates each
of these NIR intrinsics into the new MEMORY_*_LOGICAL intrinsics.

As a result, we delete a lot of similar surface access emitter code.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
3ba97176d6 intel/brw: Switch load_num_workgroups to the new memory intrinsic
A simple case we handle directly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
8a6903e50d intel/brw: Rename lsc_aop_for_nir_intrinsic to "op" instead of "aop"
This is going to handle more than atomics shortly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Ian Romanick
73f365e208 intel/brw: load_offset cannot be constant on this path
Literally inside an if-statement (about 26 lines before this hunk)
that checks for !nir_src_is_const(instr->src[1]).

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
2024-08-30 03:39:31 +00:00
Caio Oliveira
695f5314d6 intel/brw: Simplify fs_inst annotation
When INTEL_DEBUG=ann is also set, the disassembler would annotate the
output with either a string or the string verison of a NIR instruction.
This was done by keeping two pointers (but only using one at a time).

Change the code to print the instruction into a string instead of
keeping it pointer around (peg the string to the shader).  That way,
only one pointer is needed for annotations.  Because that serialization
is not free, only do that when the environment variable is set.

Since we are here, move the annotation string field to the end, moving
it to the least commonly used cacheline.  Further packing might allow
the entire fs_inst to fit in two cachelines.

For release builds, don't even add the debug annotation to the struct.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30822>
2024-08-28 03:59:50 +00:00
Caio Oliveira
31dfb04fd3 intel/brw: Remove long register file names
The long names were originally meant to map to the HW encoding but
nowadays the actual encoding values depend on gfx version, whether
instruction is 3src, etc.

Suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>
2024-08-25 22:08:14 +00:00
Caio Oliveira
d31c8bfb6f intel/brw: Remove more uses of variable length arrays
In these cases there's a clear bound we can use.  In C++ this is a
compiler extension and not compatible with zero initializing a
regular struct -- which will happen in a later change.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>
2024-08-25 22:08:14 +00:00
Caio Oliveira
86c20e2910 intel/brw: Use a helper for common VEC pattern
In the helper, instead of using the Variable Length Array, use a
fixed size array to NIR_MAX_VEC_COMPONENTS.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>
2024-08-25 22:08:14 +00:00
Caio Oliveira
abc535a3b4 intel/brw: Remove unused variable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>
2024-08-25 22:08:13 +00:00
Francisco Jerez
71ca8529c5 intel/brw/gfx12.5+: Fix IR of sub-dword atomic LSC operations.
We were currently emitting logical atomic instructions with a packed
destination region for sub-dword LSC atomics, along the lines of:

> untyped_atomic_logical(32) dst<1>:HF, ...

However, these instructions use an LSC data size D16U32, which means
that the 16b data on the return payload is expanded to 32b by the LSC
shared function, so we were lying to the compiler about the location
of the individual channels on the return payload, its execution
masking, etc.  This is why the hacks that manually set the
'inst->size_written' of the instruction were required.

In some cases this worked, but any non-trivial manipulation of the
instruction destination by lowering or optimization passes could have
led to corruption, as has been reproduced in deqp-vk during
lower_simd_width() for shaders that use 16-bit atomics in SIMD32
dispatch mode.

Note that LSC sub-dword reads aren't affected by this because they use
raw UD destinations and specify the actual bit size of the operation
datatype as the immediate SURFACE_LOGICAL_SRC_IMM_ARG, which doesn't
work for atomic operations since that immediate specifies the atomic
opcode.

Instead, have the logical operation implement the behavior of 16-bit
destinations correctly instead of silently replacing the 16-bit region
with an inconsistent 32-bit region -- This is done by emitting the MOV
instructions used to pack the data from the UD temporary into the
packed destination from the lower_logical_sends() pass instead of from
the NIR translation pass.

Fixes: 43169dbbe5 ("intel/compiler: Support 16 bit float ops")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30683>
2024-08-21 02:33:12 +00:00
Sagar Ghuge
c4f2a8d984 intel/compiler: Fix indirect offset in GS input read for Xe2+
Make sure to take new GRF size into consideration and adjust the
indirect offset according to new size so that when we do the indirect
load with address register, we load right values.

This helps pass the following tests:
   - dEQP-VK.binding_model.descriptor_buffer.mutable_descriptor.*geom*
   - dEQP-VK.ray_query.*geometry_shader.*

Backport-to: 24.2
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30679>
2024-08-16 18:40:13 +00:00
Ian Romanick
c8038643b8 intel/brw: Make ifind_msb SSA friendly
No shader-db changes on any Intel platform.

v2: Use negate(tmp) instead of creating a new temporary. Suggested by
Ken.

fossil-db:

Meteor Lake, DG2, and Skylake had similar results. (Meteor Lake shown)
Totals:
Instrs: 152535897 -> 152535883 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17112329592 -> 17112406110 (+0.00%); split: -0.06%, +0.06%

Totals from 40 (0.01% of 633223) affected shaders:
Instrs: 458813 -> 458799 (-0.00%); split: -0.01%, +0.00%
Cycle count: 4358016282 -> 4358092800 (+0.00%); split: -0.23%, +0.24%

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 150560511 -> 150560465 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15484534441 -> 15482372893 (-0.01%); split: -0.12%, +0.11%
Spill count: 59795 -> 59794 (-0.00%)
Fill count: 103513 -> 103509 (-0.00%)

Totals from 40 (0.01% of 632445) affected shaders:
Instrs: 368877 -> 368831 (-0.01%); split: -0.01%, +0.00%
Cycle count: 3918398264 -> 3916236716 (-0.06%); split: -0.49%, +0.43%
Spill count: 16896 -> 16895 (-0.01%)
Fill count: 27819 -> 27815 (-0.01%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650>
2024-08-16 14:52:04 +00:00
Ian Romanick
e9c151fde6 intel/brw: Make 16-bit ishl, ishr, and ushr SSA friendly
No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 152536266 -> 152535897 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17124901233 -> 17112329592 (-0.07%); split: -0.07%, +0.00%
Spill count: 78571 -> 78525 (-0.06%)
Fill count: 148178 -> 148132 (-0.03%)

Totals from 210 (0.03% of 633223) affected shaders:
Instrs: 514525 -> 514156 (-0.07%); split: -0.16%, +0.08%
Cycle count: 4003540698 -> 3990969057 (-0.31%); split: -0.32%, +0.00%
Spill count: 15632 -> 15586 (-0.29%)
Fill count: 26241 -> 26195 (-0.18%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650>
2024-08-16 14:52:04 +00:00
Lionel Landwerlin
fbafa9cabd intel/nir: remove load_global_const_block_intel intrinsic
load_global_constant_uniform_block_intel is equivalent in terms of
loading, then for the predicate we just do a bcsel afterward in places
where that is required.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30659>
2024-08-16 11:12:39 +00:00
Sagar Ghuge
c3c62e493f intel/compiler: Ray query requires write-back register
Bspec 57508: Structure_SIMD16TraceRayMessage:: RayQuery Enable

   "When this bit is set in the header, Trace Ray Message behaves like a
   Ray Query. This message requires a write-back message indicating
   RayQuery for all valid Rays (SIMD lanes) have completed."

If we don't pass the write-back register, somehow it was stepping on
over R0 register and can mess up the scratch space accesses which could
potentially lead to GPU hang. It can be noticed while running it under
simulator trace.

send.rta (16|M0)         null     r124  r126:1  0x0            0x02000100           {$15} // wr:1+1, rd:0; simd16 trace ray
R0 = 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30600>
2024-08-13 20:02:24 +00:00
Alyssa Rosenzweig
eec02246f8 brw: switch to derivative intrinsics
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30566>
2024-08-09 17:07:59 +00:00