Commit graph

239 commits

Author SHA1 Message Date
Kenneth Graunke
a90edad9f7 intel/brw: Fix generate_mov_indirect to check has_64bit_int not float
We are overriding the type to Q/UQ, so we need to split to two MOVs
if 64-bit integer math is not supported.  For reference, Meteorlake
does support 64-bit floats but would still not work correctly here.

See also brw_broadcast(), which does similar indirects but correctly
checks has_64bit_int instead of has_64bit_float.

Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>
2024-04-02 00:00:59 +00:00
Caio Oliveira
dae9795628 intel/brw: Remove vestiges of sources on IF opcode, only valid on Gfx6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28379>
2024-03-29 22:44:01 +00:00
Rohan Garg
48376ac3b8 intel/brw: Cleanup send generation
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28306>
2024-03-20 15:46:44 -07:00
Kenneth Graunke
97bf3d3b2d intel/brw: Replace CS_OPCODE_CS_TERMINATE with SHADER_OPCODE_SEND
There's no need for special handling here, it's just a send message
with a trivial g0 header and descriptor.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27924>
2024-03-05 11:16:20 +00:00
Caio Oliveira
865ef36609 intel/brw: Remove brw_shader.h
Find a better home for its existing content.  Some functions are
now just static functions at the usage sites.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>
2024-02-29 19:28:06 +00:00
Caio Oliveira
803a1a5ada intel/brw: Remove automatic_exec_sizes
As Ken describes: "This was only used by legacy SF/Clip/FFGS programs."

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
dae59e7078 intel/brw: Remove runtime_check_aads_emit
It was used for Gfx4 payload.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
8f3c52c1da intel/brw: Remove MRF type
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
5c93a0e125 intel/brw: Remove Gfx8- remaining opcodes
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
99d41ca90d intel/brw: Remove Gfx4-5 manual compression selection
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:38 +00:00
Caio Oliveira
f321e555b6 intel/brw: Remove Gfx8- code from EU emission
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:38 +00:00
Caio Oliveira
9569ea82a8 intel/brw: Remove Gfx8- code from generator
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:38 +00:00
Ian Romanick
8fb37ef985 intel/fs: Add fast path for ballot(true)
This doesn't help very much now. A later commit adds a NIR optimization
pass, tentatively called nir_opt_uniform_subgroup, that converts many
kinds of subgroup operations to things involving
bitCount(ballot(true)). This commit makes a huge difference in the
results of that later commit.

No shader-db changes on any Intel platform.

Fossil-db results:

All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165558033 -> 165557519 (-0.00%)
Cycles: 15156188362 -> 15156178922 (-0.00%); split: -0.00%, +0.00%

Totals from 299 (0.05% of 656117) affected shaders:
Instrs: 88293 -> 87779 (-0.58%)
Cycles: 3709498 -> 3700058 (-0.25%); split: -0.28%, +0.03%

v2: Rebase on splitting ELK from BRW. Remove devinfo->ver >= 8 check.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
2024-02-27 08:37:46 -08:00
Sagar Ghuge
6f0ab5e4d5 intel/compiler: Add texture gather offset LOD/Bias message support
v2: (Ian)
- Space formatting on conditional statement

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>
2024-02-27 00:22:46 +00:00
Sagar Ghuge
79af0ac29a intel/compiler: Add gather4_i/l/[_c]/b sampler message
v2: (Ian)
- Format comment

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>
2024-02-27 00:22:46 +00:00
Kenneth Graunke
1497f4e0c2 intel/fs: Don't include sync.nop in instruction count statistics
With the advent of software scoreboarding, we emit sync instructions
in various places to synchronize the execution pipelines.  This results
in assembly being littered with a bunch of sync.nop instructions.  That
means that when you reorder anything in the program, the scoreboarding
changes, and the number of sync.nops can vary wildly - even if the code
isn't really materially better or worse.  This makes it hard to use
tools like shader-db or fossil-db on post-Icelake platforms.

For now, exclude sync.nops from the instruction count statistic.  One
day we may want to consider improving the software scoreboarding pass
to emit fewer redundant sync.nop instructions, at which point tracking
this as a separate stat might be useful.  For now though, it's simply
cluttering and confusing our results.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27701>
2024-02-20 22:26:09 +00:00
Caio Oliveira
468a0ffe9c intel/compiler: Include brw_disasm_info.h where its used
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27579>
2024-02-15 09:26:46 +00:00
Francisco Jerez
3e710a84ad intel/fs: Set the default execution group to 0 when not representable by the platform.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27165>
2024-01-20 19:55:31 +00:00
Francisco Jerez
6e56a4b474 intel/compiler/xe2: Fix for the removal of AccWrCtrl.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26860>
2024-01-12 20:18:03 +00:00
Sviatoslav Peleshko
dbf6f0291a intel/fs: Set group 0 for Wa_14010017096 MOV instruction
We always set exec size to 16 for this MOV, but the execution group remains
from the previous emitted instruction. This can cause emitting a group
which violates PRM restriction for ChanOff: "The execution size (ExecSize)
must be a factor of the chosen offset."

Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25657>
2024-01-09 11:35:52 +00:00
Sviatoslav Peleshko
b5c0b90402 intel/compiler: Set flag reg to 0 when disabling predication
Having the reg set with predication disabled shouldn't cause any problems
during the execution. But when decompiling such instruction the flag won't
be shown in the output, so the recompiling will cause
functionally-identical but binary-different code. Fixing this makes
disasm/asm testing easier.

Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25657>
2024-01-09 11:35:52 +00:00
Sviatoslav Peleshko
4f41c44df2 intel/compiler: Add variable to dump binaries of all compiled shaders
This can be useful for testing i965_disasm and i965_asm by comparing
bin -> asm -> bin results.

Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25657>
2024-01-09 11:35:51 +00:00
Ian Romanick
e666872c75 intel/compiler: Initial bits for DPAS instruction
v2: Add brw_ir_performance.cpp and brw_fs_generator.cpp changes. Fix
overlapping register allocation (via has_source_and_destination_hazard). Fix
incorrect destination register file encoding.

v3: Prevent lower_regioning from trying to "fix" DPAS sources.

v4: Add instruction latency information for scheduling and perf
estimates.

v5: Remove all mention of DPASW. Suggested by Curro and Caio. Update
the comment in fs_inst::has_source_and_destination_hazard. Suggested
by Caio.

v6: Add some comments near the src2 calculation in
fs_inst::size_read. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:16 -08:00
Francisco Jerez
1eff2fcb62 intel/compiler: Add polygon count statistic to brw_compile_stats.
And use it in ANV in order to return a "SIMDNxM" name from
vkGetPipelineExecutablePropertiesKHR.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
2023-12-22 18:05:30 +00:00
Rohan Garg
2444a3cd46 intel/compiler: migrate WA 14013672992 to use WA framework
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26006>
2023-11-02 16:39:25 +00:00
Kenneth Graunke
fddad4d5f9 intel/compiler: Assert that FS_OPCODE_[REP_]FB_WRITE is for pre-Gfx7
We use SHADER_OPCODE_SEND directly instead of FS_OPCODE_FB_WRITE (for a
while now) and FS_OPCODE_REP_FB_WRITE (since the previous commit).

Assert that it isn't used on Gfx7+.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
2023-10-30 23:03:23 +00:00
Francisco Jerez
8b1dc77521 intel/fs/xe2+: Scale BRW_MAX_MSG_LENGTH by native register size.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Felix DeGrood
d04be9770b intel/compiler: use shader source hash in shader dump code
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23942>
2023-07-20 09:08:08 +00:00
Lionel Landwerlin
3384f029be intel/compiler: rework input parameters
Use a struct for various common parameters rather than per stage
structure or arguments to stage specific entrypoints.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23942>
2023-07-20 09:08:08 +00:00
Lionel Landwerlin
049c791a63 intel/fs: fix pull-constant-load prior to gfx7
In ad9bc1ffb5 ("intel/fs: enable UBO accesses through bindless heap")
we added a new source, we need to fixup the source index for the
generator.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ad9bc1ffb5 ("intel/fs: enable UBO accesses through bindless heap")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23405>
2023-06-06 14:47:41 +00:00
Lionel Landwerlin
6d6877bf99 intel/fs: enable extended bindless surface offset
Gives use 4Gb of bindless surface state on Gfx12.5+ instead of 64Mb.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
Lionel Landwerlin
e09cfda0de intel/fs: lower get_buffer_size like other logical sends
This will also enable the use of the bindless heap.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:36 +00:00
Lionel Landwerlin
2acc2f18ea intel/compiler: report max dispatch width statistic
Most tools looking at shader stats assume that there is only a single
resulting binary shader out of a single input. On Intel HW this is not
always the case. So having a statistic on each variant that reports
the maximum dispatch width helps showing improvement on a single
shader in terms of how large we manage to compile it.

For shaders that can be compiled in multiple SIMD width (like fragment
shaders), this will report the maximum dispatch width in the
statistics of each variants.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22014>
2023-03-21 11:53:04 +00:00
Sagar Ghuge
9a34b2ab0e intel/compiler: Add swsb_stall debug option
When enabled, on gfx12 plus, we will add the sync nop instruction after
each instruction to make sure that current instruction depends on the
previous instruction explicitly.

This option will help us to get a hint if something is missing or broken
in software scoreboard pass.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21797>
2023-03-10 06:55:39 +00:00
Kenneth Graunke
dfe652fb03 intel/eu: Simplify brw_F32TO16 and brw_F16TO32
Now that we aren't using them on Gfx8+ we can drop a lot of cruft.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
2023-03-09 23:26:17 +00:00
Kenneth Graunke
c590a3eadf intel/fs: Move packHalf2x16 handling to lower_pack()
This mainly lets the software scoreboarding pass correctly mark the
instructions, without needing to resort to fragile manual handling in
the generator.

We can also make small improvements.  On Gfx 8LP-12.0, we no longer have
the restrictions about DWord alignment, so we can simply write each half
into its intended location, rather than writing it to the low DWord and
then shifting it in place.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
2023-03-09 23:26:17 +00:00
Lionel Landwerlin
09cdb77a92 intel/fs: report max register pressure in shader stats
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21756>
2023-03-08 13:37:07 +00:00
Mark Janes
b96019f82b intel/fs: use generated workaround helpers for Wa_14010017096
This workaround does not apply beyond gen 12.0.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21746>
2023-03-07 00:10:33 +00:00
Jason Ekstrand
714a291673 intel/compiler: Use SHADER_OPCODE_SEND for PI messages
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>
2023-02-06 09:12:17 +00:00
Lionel Landwerlin
13cca48920 intel/fs: drop FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GFX7
We can lower FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD into other more
generic sends and drop this internal opcode.

The idea behind this change is to allow bindless surfaces to be used
for UBO pulls and why it's interesting to be able to reuse
setup_surface_descriptors(). But that will come in a later change.

No shader-db changes on TGL & DG2.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20416>
2023-01-26 11:26:53 +00:00
Jason Ekstrand
c80c0ed943 intel/fs: Always use integer types for indirect MOVs
There's a new Gen12.5 restriction which forbids using the VxH or Vx1 on
the floating-point pipe.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Lionel Landwerlin
37b3601052 intel/fs: switch register allocation spilling to use LSC on Gfx12.5+
v2: drop the hardcoded inst->mlen=1 (Rohan)

v3: Move back to LOAD/STORE messages (limited to SIMD16 for LSC)

v4: Also use 4 GRFs transpose loads for fills (Curro)

v5: Reduce amount of needed register to build per lane offsets (Curro)
    Drop some now useless SIMD32 code
    Unify unspill code

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17555>
2022-08-24 17:51:40 +00:00
Kenneth Graunke
9afd955353 intel/compiler: Delete unused Gfx8+ code in brw_find_live_channel()
We now handle this in fs_visitor::lower_find_live_channel().

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17530>
2022-08-02 08:41:43 +00:00
Ian Romanick
bbcb881f46 intel/fs: Remove non-_LOGICAL URB messages
The _LOGICAL versions are lowered direct to SEND, so nothing can ever
generate these messages.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
2022-07-08 19:45:34 +00:00
Kenneth Graunke
72e9843991 intel/compiler: Introduce a new brw_isa_info structure
This structure will contain the opcode mapping tables in the next
commit.  For now, this is the mechanical change to plumb it into all
the necessary places, and it continues simply holding devinfo.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
2022-06-30 23:46:35 +00:00
Ian Romanick
676acfe956 intel/fs: Add missing synchronization for WaW dependency
v2: Do the synchronization in the correct place.  Noticed by Curro.

Fixes: b5fa43952a ("intel/fs: Better handle constant sources of FS_OPCODE_PACK_HALF_2x16_SPLIT")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Felix DeGrood <felix.j.degrood@intel.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17037>
2022-06-17 17:05:43 +00:00
Jason Ekstrand
dfedeccc13 intel: Only set VectorMaskEnable when needed
For cases with lots of very small primitives, this may improve
performance because we're not executing those dead channels all the
time.

Shader-db reports no instruction or cycle-count changes.  However, by
hacking up the driver to report when this optimization triggers, it
appears to affect about 10% of shader-db.

v2 (Kenneth Graunke): Always enable VMask prior to XeHP for now,
because using VMask on those platforms allows us to perform the
eliminate_find_live_channel() optimization.  However, XeHP doesn't
seem to have packed fragment shader dispatch, so we lose that
optimization regardless, and there's no reason not to avoid vmask.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1054>
2022-05-27 21:52:48 +00:00
Kenneth Graunke
9886615958 intel/compiler: Move spill/fill tracking to the register allocator
Originally, we had virtual opcodes for scratch access, and let the
generator count spills/fills separately from other sends.  Later, we
started using the generic SHADER_OPCODE_SEND for spills/fills on some
generations of hardware, and simply detected stateless messages there.

But then we started using stateless messages for other things:
- anv uses stateless messages for the buffer device address feature.
- nir_opt_large_constants generates stateless messages.
- XeHP curbe setup can generate stateless messages.

So counting stateless messages is not accurate.  Instead, we move the
spill/fill accounting to the register allocator, as it generates such
things, as well as the load/store_scratch intrinsic handling, as those
are basically spill/fills, just at a higher level.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16691>
2022-05-25 06:56:01 +00:00
Ian Romanick
b5fa43952a intel/fs: Better handle constant sources of FS_OPCODE_PACK_HALF_2x16_SPLIT
I noticed that a *LOT* of fragment shaders in Shadow of the Tomb Raider,
for instance, end up with a sequence of NIR like:

    vec1 32 ssa_2 = load_const (0x00000000 = 0.000000)
    ...
    vec1 32 ssa_191 = pack_half_2x16_split ssa_188, ssa_2
    vec1 32 ssa_192 = pack_half_2x16_split ssa_189, ssa_2
    vec1 32 ssa_193 = pack_half_2x16_split ssa_190, ssa_2

This results in an assembly sequence like:

    mov(8)          g28<1>UD        0x00000000UD
    mov(8)          g21<2>HF        g28<8,8,1>F
    shl(8)          g21<1>UD        g21<8,8,1>UD    0x00000010UD
    mov(8)          g21<2>HF        g25<8,8,1>F
    mov(8)          g19<2>HF        g28<8,8,1>F
    shl(8)          g19<1>UD        g19<8,8,1>UD    0x00000010UD
    mov(8)          g19<2>HF        g23<8,8,1>F
    mov(8)          g20<2>HF        g28<8,8,1>F
    shl(8)          g20<1>UD        g20<8,8,1>UD    0x00000010UD
    mov(8)          g20<2>HF        g24<8,8,1>F

After this commit, the generated assembly is:

    mov(8)          g21<1>UD        0x00000000UD
    mov(8)          g21<2>HF        g23<8,8,1>F
    mov(8)          g19<1>UD        0x00000000UD
    mov(8)          g19<2>HF        g17<8,8,1>F
    mov(8)          g20<1>UD        0x00000000UD
    mov(8)          g20<2>HF        g18<8,8,1>F

Tiger Lake, Ice Lake, Skylake, and Haswell had similar results. (Ice Lake shown)
total instructions in shared programs: 20119086 -> 20119034 (<.01%)
instructions in affected programs: 9056 -> 9004 (-0.57%)
helped: 8
HURT: 0
helped stats (abs) min: 2 max: 16 x̄: 6.50 x̃: 4
helped stats (rel) min: 0.29% max: 1.75% x̄: 1.00% x̃: 0.98%
95% mean confidence interval for instructions value: -11.01 -1.99
95% mean confidence interval for instructions %-change: -1.56% -0.44%
Instructions are helped.

total cycles in shared programs: 861019414 -> 861021044 (<.01%)
cycles in affected programs: 279862 -> 281492 (0.58%)
helped: 4
HURT: 2
helped stats (abs) min: 6 max: 936 x̄: 239.00 x̃: 7
helped stats (rel) min: 0.03% max: 8.13% x̄: 2.09% x̃: 0.09%
HURT stats (abs)   min: 18 max: 2568 x̄: 1293.00 x̃: 1293
HURT stats (rel)   min: 0.36% max: 1.14% x̄: 0.75% x̃: 0.75%
95% mean confidence interval for cycles value: -972.56 1515.89
95% mean confidence interval for cycles %-change: -4.77% 2.49%
Inconclusive result (value mean confidence interval includes 0).

Broadwell
total instructions in shared programs: 17812327 -> 17812263 (<.01%)
instructions in affected programs: 9867 -> 9803 (-0.65%)
helped: 8
HURT: 0
helped stats (abs) min: 2 max: 28 x̄: 8.00 x̃: 4
helped stats (rel) min: 0.32% max: 1.80% x̄: 1.00% x̃: 0.95%
95% mean confidence interval for instructions value: -15.46 -0.54
95% mean confidence interval for instructions %-change: -1.54% -0.47%
Instructions are helped.

total cycles in shared programs: 904768620 -> 904773291 (<.01%)
cycles in affected programs: 454799 -> 459470 (1.03%)
helped: 4
HURT: 4
helped stats (abs) min: 36 max: 586 x̄: 344.50 x̃: 378
helped stats (rel) min: 0.47% max: 4.04% x̄: 2.01% x̃: 1.77%
HURT stats (abs)   min: 1 max: 5572 x̄: 1512.25 x̃: 238
HURT stats (rel)   min: <.01% max: 2.77% x̄: 1.46% x̃: 1.53%
95% mean confidence interval for cycles value: -1122.40 2290.15
95% mean confidence interval for cycles %-change: -2.26% 1.71%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 18581 -> 18579 (-0.01%)
spills in affected programs: 323 -> 321 (-0.62%)
helped: 1
HURT: 0

total fills in shared programs: 24985 -> 24981 (-0.02%)
fills in affected programs: 1348 -> 1344 (-0.30%)
helped: 1
HURT: 0

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 143585431 -> 143513657 (-0.0%)
Instructions helped: 14403

Cycles in all programs: 8439312778 -> 8439371578 (+0.0%)
Cycles helped: 10570
Cycles hurt: 3290

Gained: 146
Lost: 74

All of the lost and gained fossil-db shaders are SIMD32 fragment
shaders.  14,247 of the affected shaders are from Shadow of the Tomb
Raider.  154 are from Batman Arkham Origins, and the remaining two are
from Octopath Traveler.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15089>
2022-04-07 18:26:23 +00:00
Kenneth Graunke
6fa66ac228 intel/compiler: Implement nir_intrinsic_last_invocation
We haven't exposed this intrinsic as it doesn't directly correspond to
anything in SPIR-V.  However, it's used internally by some NIR passes,
namely nir_opt_uniform_atomics().

We reuse most of the infrastructure in brw_find_live_channel, but with
LZD/ADD instead of FBL.  A new SHADER_OPCODE_FIND_LAST_LIVE_CHANNEL is
like SHADER_OPCODE_FIND_LIVE_CHANNEL but from the other side.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
2022-03-26 00:28:19 +00:00