Commit graph

649 commits

Author SHA1 Message Date
Caio Oliveira
40416850f1 intel/compiler: Re-enable opt_zero_samples() in many cases for Gfx12.5
The workaround applies specifically to Cube and Cube Arrays, so we can
still apply the optimization for the others.

Ideally we would like to pull opt_zero_samples logic into the lowering
sends -- to avoid adding a bit to communicate between passes.  However
the texture coordinates for the LOGICAL backend instructions, which
are a common target for the optimization, are combined into offsets over
a single VGRF, so we can't easily identify the constant cases.  The
copy-prop pass make this more visible for opt_zero_samples.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>
2023-11-09 03:56:28 +00:00
Caio Oliveira
daeab51a62 intel/compiler: Re-enable opt_zero_samples() for Gfx7+
Inadvertently, because of a sequence of changes elsewhere, this pass
ended up not having any effect:

- Before Gfx5 the optimization is not applicable.

- On Gfx5-6 it doesn't apply because it sampler operations don't
  currently use LOAD_PAYLOAD, but write the MOVs directly.  Not clear to
  me whether they ever did.

- On Gfx7+ it doesn't apply anymore because now the logical sampler
  operations are now lowered directly to SENDs, and the is_tex() check
  would skip them.

Since the LOAD_PAYLOAD implementation applies for Gfx7+ only, rework the
pass to work again by handling SEND instructions.  To make the pass
easier, the optimization will happen before opt_split_sends() so only
one LOAD_PAYLOAD needs to be cared for.

Update the code to accept BAD_FILE sources in addition to zeros, these
are added in some cases as padding and effectively are don't care
values, so we can assume them zeros.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>
2023-11-09 03:56:28 +00:00
Caio Oliveira
ef8553082e intel/compiler: Rework opt_split_sends to not rely/modify LOAD_PAYLOAD
This is a preparation to (re-)enable opt_zero_samples(), which will reduce
a SEND mlen before we split it.  When that happen, opt_split_sends()
won't be able to rely on the fact that mlen covers the entire
LOAD_PAYLOAD.

Since we are changing that, take the opportunity to also not modify the
existing LOAD_PAYLOAD, just create two new ones with the exact set of
sources.  This allows the pass to be further simplified by iterating
forward and not require live_variables analysis.

The helper function was added so can be used later for
opt_zero_samples().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>
2023-11-09 03:56:28 +00:00
Caio Oliveira
e017bcae59 intel/compiler: Clarify the asserts in nir_load_workgroup_id lowering
For Task/Mesh WorkgroupID is now lowered to WorkgroupIndex by the
generic NIR pass, so we shouldn't hit this.  We can now simplify the
asserting code in emit_work_group_id_setup().

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25977>
2023-11-08 17:18:36 -08:00
Kenneth Graunke
48f60f4c4b intel/compiler: Convert the repclear shader to use send-from-GRF
Sandybridge uses this code and needs MRFs, but all other platforms send
from GRFs.  Do that directly rather than relying on the MRF hack.

Ivybridge and later also use SHADER_OPCODE_SEND directly rather than
a virtual opcode that's handled in the generator, so we follow suit.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
2023-10-30 23:03:23 +00:00
Kenneth Graunke
ef7d1b5f44 intel/compiler: Drop unused saturate handling in repclear shader
We never set key->clamp_fragment_color when compiling the BLORP fast
clear shaders.  Besides, we were setting saturate on an FB write opcode,
which...isn't even a thing.  We would need it on the MOV, and weren't
setting it there.  So it wouldn't have even worked.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
2023-10-30 23:03:23 +00:00
Kenneth Graunke
e6d9267d4f intel/compiler: Delete repclear shader's special case for 1 color target
This is basically just once through the loop but copy and pasted.  One
difference is that the single render target case used a headerless
message, and the multiple render target case always used headers.

Now we use headerless messages for the first render target, even in the
multiple render target case.  While we already have it set up for the
other RTs, it's still 2 fewer registers to send.  Minor improvement.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
2023-10-30 23:03:23 +00:00
Kenneth Graunke
e6460fe66b intel/compiler: Delete unused repclear shader uniform handling
A long time ago, we used a uniform for the clear color.  Back in 2014,
we added support for using a flat input instead, as this was easier for
Vulkan, but we left the option of using a uniform for OpenGL.

Eventually nobody used the uniform approach anymore, but the compiler
code to handle it remained.  Drop the dead code.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
2023-10-30 23:03:23 +00:00
Kenneth Graunke
b35f1fc910 intel/compiler: Delete unused emit_dummy_fs()
This code is compiled out, but has been left in place in case we wanted
to use it for debugging something.  In the olden days, we'd use it for
platform enabling.  I can't think of the last time we did that, though.

I also used to use it for debugging.  If something was misrendering, I'd
iterate through shaders 0..N, replacing them with "draw hot pink" until
whatever shader was drawing the bad stuff was brightly illuminated.
Once it was identified, I'd start investigating that shader.

These days, we have frameretrace and renderdoc which are like, actual
tools that let you highlight draws and replace shaders.  So we don't
need to resort iterative driver hacks anymore.  Again, I can't think of
the last time I actually did that.

So, this code is basically just dead.  And it's using legacy MRF paths,
which we could update...or we could just delete it.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
2023-10-30 23:03:23 +00:00
Lionel Landwerlin
3f973a4f45 Revert "intel/fs: limit register flag interaction of FIND_*LIVE_CHANNEL"
This reverts commit c9739e8912.

We don't have a full understanding of what is going on but reverting
definitely fixes a hang.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c9739e8912 ("intel/fs: limit register flag interaction of FIND_*LIVE_CHANNEL")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9868
Tested-By: Valentin Geyer <trayshar@t-online.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25563>
2023-10-13 08:37:28 +03:00
Alyssa Rosenzweig
c39896b17b nir: Use getters for nir_src::parent_*
First, we need to give the parent_instr field a unique name to be able to
replace with a helper.  We have parent_instr fields for both nir_src and
nir_def, so let's rename nir_src::parent_instr in preparation for rework.

This was done with a combination of sed and manual fix-ups.

Then we use semantic patches plus manual fixups:

    @@
    expression s;
    @@

    -s->renamed_parent_instr
    +nir_src_parent_instr(s)

    @@
    expression s;
    @@

    -s.renamed_parent_instr
    +nir_src_parent_instr(&s)

    @@
    expression s;
    @@

    -s->parent_if
    +nir_src_parent_if(s)

    @@
    expression s;
    @@

    -s.renamed_parent_if
    +nir_src_parent_if(&s)

    @@
    expression s;
    @@

    -s->is_if
    +nir_src_is_if(s)

    @@
    expression s;
    @@

    -s.is_if
    +nir_src_is_if(&s)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24671>
2023-10-10 04:58:05 -04:00
Ian Romanick
bac10ef4aa intel/fs: Add DP4A to get_lowered_simd_width
While working on cooperative matrix support, I noticed some invalid
DP4A instructions being generated.

    dp4a(32)    g33<1>UD    g21<8,8,1>UD    g1.0<0,1,0>UD   g9<1,1,1>UD

This violates the constraint that the destination or a source can only
access two consecutive GRFs.

I'm a little surprised that validation didn't catch this. Perhaps
because it's a 3 source instruction? Either way, it seems like a bigger
project to fix that.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: 0f809dbf40 ("intel/compiler: Basic support for DP4A instruction")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25554>
2023-10-07 02:27:53 +00:00
Caio Oliveira
81bc09bf97 intel/fs: Tweak default case of fs_inst::size_read()
In the default case, there's a special case with a few conditions.
Prefer the cheapest conditions first, so we can take advantage of
short-circuiting.

Effect is a small but still significant reduce in shader compilation
times, as can be seen by:

- Fossil replay for Rise of the Tomb Raider

```
Difference at 95.0% confidence
	-0.433333 +/- 0.028609
	-1.42556% +/- 0.0941163%
	(Student's t, pooled s = 0.0337886)
```

- Fossil replay for Batman Arkham City

```
Difference at 95.0% confidence
	-8.84 +/- 0.146083
	-1.65932% +/- 0.0274207%
	(Student's t, pooled s = 0.125423)
```

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25549>
2023-10-06 09:16:56 +00:00
Sviatoslav Peleshko
8f23b45252 intel/fs: Fix "packed word exception" condition for register regioning
Fixes: a6bf5f88 ("i965/fs: Enforce common regioning restrictions by SIMD splitting.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9432
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25378>
2023-10-05 01:41:42 +00:00
Francisco Jerez
53d1d793cb intel/fs: Delete manual 'inst->mlen' calculations from all uses of logical URB writes.
Rework:
 * Marcin: update emit_urb_indirect_vec4_write

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25195>
2023-09-27 23:57:25 +00:00
Francisco Jerez
34a2c9ce35 intel/fs: Specify number of data components of logical URB writes via control immediate.
This is what most logical SEND messages do when they take a variable
number of components.  'inst->mlen' is expected to be zero for logical
SEND opcodes, which are expected to behave like plain arithmetic
operations, so certain automated transformations (like SIMD lowering)
can manipulate them without opcode-specific special-casing.

Guessing the number of components from 'inst->mlen' has other
disadvantages, because it requires duplicating the logic that infers
the message payload size in every use of the instruction -- Instead we
can just do the computation once during logical send lowering.  In
addition on LNL platform this causes the 'inst->mlen' field of URB
writes to have units inconsistent with every other SEND instruction,
which is likely to lead to confusion and bugs down the road.

Rework:
 * Marcin: update emit_urb_indirect_vec4_write

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25195>
2023-09-27 23:57:25 +00:00
Jordan Justen
c28539a2fe intel/compiler: Use enum xe2_lsc_cache_load on xe2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25195>
2023-09-27 23:57:25 +00:00
Ian Romanick
feec9166cd intel/compiler/xe2: Handle new URB write messages
Rework:
 * idr v1: Fix compilation error.
 * idr v2: Add support for per-channel offsets.
 * idr v3: get_lowered_simd_width is 16 on Xe2+.
 * idr v4: Add disassembly support.  Add validation support.
 * Sqaushed in changes Marcin Ślusarz's patches:
   * "intel/compiler: skip adding 0 to payload address"
   * "intel/compiler/xe2: drop masking off top 8 bits of URB handle"

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25195>
2023-09-27 23:57:25 +00:00
Caio Oliveira
c487ba26ca intel/compiler: Don't store stage name and abbrev
Those are used in the failure paths and are easily retriavable from the
stage itself, so no need to store them.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25367>
2023-09-26 18:12:53 -07:00
Caio Oliveira
1cdc4be14b intel/compiler: Don't allocate memory for SIMD select error handling
The position in the error array already indicate the SIMD in question,
so take off all the formatted printing from the errors -- which in some
cases were just not needed.  We lose a little bit of extra context but
it is all easily derivable from the message and the SIMD.

This also will remove the overhead when SIMD selection is being used to
just to find the selected dispatch width -- at a point where the shaders
were already compiled -- and the errors are not used at all.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9849
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25336>
2023-09-22 16:23:02 +00:00
Jordan Justen
9846dd798b intel/compiler: Update opt_split_sends() for Xe2 reg size
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 23:06:04 -07:00
Jordan Justen
727ab2c11d intel/compiler/fs: Support Xe2 reg size in assign_curb_setup
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Caio Oliveira
8944ac7d6c intel/fs/xe2+: Update BS payload setup for Xe2 reg size.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
14e1b9ee69 intel/fs/xe2+: Update TES payload setup for Xe2 reg size.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Ian Romanick
0b23df3951 intel/compiler/xe2: Update fs_visitor::setup_vs_payload to account for Xe2 reg size
[ Francisco Jerez: Simplify. ]

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
a573531785 intel/compiler/xe2+: Represent dispatch_grf_start_reg in native GRF units.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
17ef5e7ead intel/fs/xe2+: Allow increased SIMD width for various get_fpu_lowered_simd_width() restrictions.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
421d43fe62 intel/fs/xe2+: Fixes for increased accumulator register width.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
bd98df5d8e intel/compiler: Make MAX_VGRF_SIZE macro depend on devinfo and update it for Xe2.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Iván Briano
4eddeea7bf intel/fs: handle URB setup for fast linked mesh pipelines
Up until now, the mesh pipeline assumed it would be always linked to the
fragment shader, and so the calculated MUE map would always be
available.
That is not the case for fast linked pipeline libraries, so the URB
setup needs to account for this. We do this by replicating what's done
for non-mesh pipelines, defining the URB based on the FS inputs, and
always assuming they will be laid out in order of varying number, except
that we also account for per-primitive attributes.

Fixes all GPL using tests under dEQP-VK.mesh_shader.ext.smoke.*

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25047>
2023-09-12 02:51:31 +00:00
Lionel Landwerlin
c9739e8912 intel/fs: limit register flag interaction of FIND_*LIVE_CHANNEL
Those instructions do not access the flag registers on Gfx8+. Removing
the interaction enables CSE to remove more of those instructions.

Results are a bit mixed (DG2 vulkan fossils):

  ACO:
  Totals from 127 (5.97% of 2128) affected shaders:
  Instrs: 139966 -> 138972 (-0.71%); split: -0.85%, +0.14%
  Cycles: 1685747 -> 1667480 (-1.08%); split: -2.35%, +1.26%
  Max live registers: 10582 -> 10544 (-0.36%)
  Max dispatch width: 1048 -> 1040 (-0.76%)

  Cyberpunk 2077:
  Totals from 2879 (27.95% of 10301) affected shaders:
  Instrs: 4264789 -> 4225666 (-0.92%); split: -1.01%, +0.09%
  Cycles: 72380209 -> 71619521 (-1.05%); split: -1.63%, +0.58%
  Subgroup size: 30624 -> 30632 (+0.03%)
  Spill count: 98 -> 101 (+3.06%)
  Fill count: 90 -> 93 (+3.33%)
  Scratch Memory Size: 8192 -> 9216 (+12.50%)
  Max live registers: 217807 -> 217098 (-0.33%); split: -0.59%, +0.26%
  Max dispatch width: 23792 -> 24112 (+1.34%)

  Gaining 40 SIMD16 shaders

  Rise Of The Tomb Raider:
  Totals from 622 (5.06% of 12289) affected shaders:
  Instrs: 437380 -> 434760 (-0.60%); split: -0.72%, +0.12%
  Cycles: 261843085 -> 261580703 (-0.10%); split: -0.73%, +0.63%
  Max live registers: 27731 -> 27766 (+0.13%); split: -1.01%, +1.14%
  Max dispatch width: 5832 -> 5432 (-6.86%); split: +0.27%, -7.13%

  Loosing 26 SIMD32 shaders

  Strange Brigade:
  Totals from 1298 (31.48% of 4123) affected shaders:
  Instrs: 1504408 -> 1487968 (-1.09%); split: -1.17%, +0.08%
  Cycles: 20735976 -> 20443216 (-1.41%); split: -1.60%, +0.19%
  Max live registers: 89911 -> 89957 (+0.05%)

DG2 shader-db run:

  total instructions in shared programs: 23130895 -> 23130036 (<.01%)
  instructions in affected programs: 260956 -> 260097 (-0.33%)
  helped: 234
  HURT: 101
  helped stats (abs) min: 1 max: 54 x̄: 6.36 x̃: 4
  helped stats (rel) min: 0.05% max: 8.16% x̄: 2.01% x̃: 1.90%
  HURT stats (abs)   min: 1 max: 37 x̄: 6.23 x̃: 3
  HURT stats (rel)   min: 0.02% max: 5.67% x̄: 0.89% x̃: 0.55%
  95% mean confidence interval for instructions value: -3.62 -1.51
  95% mean confidence interval for instructions %-change: -1.33% -0.94%
  Instructions are helped.

  total loops in shared programs: 6071 -> 6071 (0.00%)
  loops in affected programs: 0 -> 0
  helped: 0
  HURT: 0

  total cycles in shared programs: 898610645 -> 898557166 (<.01%)
  cycles in affected programs: 18308201 -> 18254722 (-0.29%)
  helped: 315
  HURT: 48
  helped stats (abs) min: 1 max: 19312 x̄: 404.23 x̃: 128
  helped stats (rel) min: 0.02% max: 28.98% x̄: 3.92% x̃: 2.65%
  HURT stats (abs)   min: 2 max: 14478 x̄: 1538.60 x̃: 409
  HURT stats (rel)   min: <.01% max: 23.24% x̄: 3.34% x̃: 0.41%
  95% mean confidence interval for cycles value: -333.68 39.03
  95% mean confidence interval for cycles %-change: -3.51% -2.41%
  Inconclusive result (value mean confidence interval includes 0).

  total spills in shared programs: 5964 -> 5964 (0.00%)
  spills in affected programs: 0 -> 0
  helped: 0
  HURT: 0

  total fills in shared programs: 6909 -> 6909 (0.00%)
  fills in affected programs: 0 -> 0
  helped: 0
  HURT: 0

  total sends in shared programs: 1040266 -> 1040266 (0.00%)
  sends in affected programs: 0 -> 0
  helped: 0
  HURT: 0

  LOST:   3
  GAINED: 1

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24553>
2023-09-06 14:47:40 +00:00
Kenneth Graunke
08fc4603dd intel/fs: Dump IR for pre-RA scheduler modes in DEBUG_OPTIMIZER
This lets us more easily compare and contrast the various scheduling
options that the compiler considered.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
2023-08-23 21:34:38 +00:00
Kenneth Graunke
07f2ad32e4 intel/fs: Pick the lowest register pressure schedule when spilling
We try various pre-RA scheduler modes and see if any of them allow
us to register allocate without spilling.  If all of them spill,
however, we left it on the last mode: LIFO.  This is unfortunately
sometimes significantly worse than other modes (such as "none").

This patch makes us instead select the pre-RA scheduling mode that
gives the lowest register pressure estimate, if none of them manage
to avoid spilling.  The hope is that this scheduling will spill the
least out of all of them.

fossil-db stats (on Alchemist) speak for themselves:

    Totals:
    Instrs: 197297092 -> 195326552 (-1.00%); split: -1.02%, +0.03%
    Cycles: 14291286956 -> 14303502596 (+0.09%); split: -0.55%, +0.64%
    Spill count: 190886 -> 129204 (-32.31%); split: -33.01%, +0.70%
    Fill count: 361408 -> 225038 (-37.73%); split: -39.17%, +1.43%
    Scratch Memory Size: 12935168 -> 10868736 (-15.98%); split: -16.08%, +0.10%

    Totals from 1791 (0.27% of 668386) affected shaders:
    Instrs: 7628929 -> 5658389 (-25.83%); split: -26.50%, +0.67%
    Cycles: 719326691 -> 731542331 (+1.70%); split: -10.95%, +12.65%
    Spill count: 110627 -> 48945 (-55.76%); split: -56.96%, +1.20%
    Fill count: 221560 -> 85190 (-61.55%); split: -63.89%, +2.34%
    Scratch Memory Size: 4471808 -> 2405376 (-46.21%); split: -46.51%, +0.30%

Improves performance when using XeSS in Cyberpunk 2077 by 90% on A770.
Improves performance of Borderlands 3 by 1.54% on A770.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
2023-08-23 21:34:38 +00:00
Kenneth Graunke
158ac265df intel/fs: Make helpers for saving/restoring instruction order
This moves a bit of code out of a large function, but also lets us reuse
it a few extra places in the next commit.

I opted to stop using ralloc here since this is short-lived data that
doesn't need to stick around for the rest of the compile, and it's easy
enough to free.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
2023-08-23 21:34:38 +00:00
Kenneth Graunke
2dd56921c9 intel/fs: Index scheduler mode string table by mode enum
pre_modes[] is an array with the modes ordered in our desired
preference.  scheduler_mode_name[] was also in that order, and the two
had to be kept in sync.  This is a little silly; we should just have
a mode enum -> string table and look it up via the enum.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
2023-08-23 21:34:38 +00:00
Kenneth Graunke
7eba19245d intel/compiler: Move SCHEDULE_NONE handling into schedule_instructions()
I'm going to introduce another call site for this function, and just
handling SCHEDULE_NONE in the scheduler itself makes more sense than
duplicating the logic.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
2023-08-23 21:34:38 +00:00
Kenneth Graunke
743fd60bea intel/fs: Account for payload GRFs when calculating register pressure
The register pressure analysis I wrote in 2013 only considered VGRFs,
and not other GRFs, such as payload registers and push constants.  We
need to consider those too, because payload registers definitely occupy
space and add to pressure.

In 2015, Connor already made the scheduler account for this, so the only
real use for this is in shader statistic dumps and optimizer printouts.
But we should make it more accurate.  (We will use it in more places
shortly, a few commits from now.)

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
2023-08-23 21:34:38 +00:00
Kenneth Graunke
d7daf78f62 intel/compiler: Respect NIR_DEBUG_PRINT_INTERNAL for DEBUG_OPTIMIZER
If the NIR_DEBUG_PRINT_INTERNAL flag is not set, don't print debugging
information for internal shaders in INTEL_DEBUG=optimizer dumps.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24684>
2023-08-17 18:19:53 +00:00
Matt Turner
d142c845d0 Revert "intel/fs: only avoid SIMD32 if strictly inferior in throughput"
This reverts commit 6b494745be.

The logic is not entirely correct: the comparison is between two
static-analysis estimates of a dynamic system with variables that aren't
captured by the shader source, so using ">" will always have greater potential
to cause regressions whenever the performance difference between the two builds
is something not captured by the static model, no matter how much the model is
improved.

Reference: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9262
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24615>
2023-08-16 14:56:15 +00:00
Faith Ekstrand
4695bebc79 nir: Drop nir_dest
Instead, we replace every use of it with nir_def.  Most of this commit
was generated by sed:

   sed -i -e 's/dest.ssa/def/g' src/**/*.h src/**/*.c src/**/*.cpp

A few manual fixups were required in lima and the nir_legacy code.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>
2023-08-14 21:22:53 +00:00
Alyssa Rosenzweig
09d31922de nir: Drop "SSA" from NIR language
Everything is SSA now.

   sed -e 's/nir_ssa_def/nir_def/g' \
       -e 's/nir_ssa_undef/nir_undef/g' \
       -e 's/nir_ssa_scalar/nir_scalar/g' \
       -e 's/nir_src_rewrite_ssa/nir_src_rewrite/g' \
       -e 's/nir_gather_ssa_types/nir_gather_types/g' \
       -i $(git grep -l nir | grep -v relnotes)

   git mv src/compiler/nir/nir_gather_ssa_types.c \
          src/compiler/nir/nir_gather_types.c

   ninja -C build/ clang-format
   cd src/compiler/nir && find *.c *.h -type f -exec clang-format -i \{} \;

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24585>
2023-08-12 16:44:41 -04:00
Lionel Landwerlin
6f694432e4 intel/fs: add variable for output of debug backend optimizer
It can be useful to compare 2 runs with different compiler changes.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24552>
2023-08-10 06:39:57 +00:00
Lionel Landwerlin
0e244d56e3 intel/fs: track more steps with INTEL_DEBUG=optimizer
One particular nice thing to have is the first generated backend IR
before validation. Especially if you made a mistake in the NIR
translation, you can at least look at it before validation tells you
off.

Then the last 2 steps of the optimize() function can be interesting to
look at.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24552>
2023-08-10 06:39:57 +00:00
Lionel Landwerlin
9934613c74 anv/hasvk: track robustness per pipeline stage
And split them into UBO and SSBO

v2 (Lionel):
 - Get rid of robustness fields in anv_shader_bin
v3 (Lionel):
 - Do not pass unused parameters around

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17545>
2023-08-09 09:00:12 +03:00
Alyssa Rosenzweig
ab0d878932 treewide: Remove more is_ssa asserts
Stuff Coccinelle missed.

   sed -i -e '/assert(.*\.is_ssa)/d' $(git grep -l is_ssa)
   sed -i -e '/ASSERT.*\.is_ssa)/d' $(git grep -l is_ssa)

+ a manual fixup to restore the assert for parallel copy lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24432>
2023-08-03 22:40:28 +00:00
Yonggang Luo
86bcc90c0e intel/compiler,intel/blorp,intel/vulkan: decouple vulkan driver and compiler from gallium
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24438>
2023-08-03 22:00:15 +00:00
Lionel Landwerlin
d33aff783d intel/fs: add support for sparse accesses
Purely from the backend point of view it's just an additional
parameter to sampler messages.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23882>
2023-07-27 02:02:30 +03:00
Lionel Landwerlin
d099e47de0 intel/fs: add more UNDEFs around SEND messages
lower_find_live_channel() in particular is used a lot in control flow
to find the live channel for the surface/sampler handle. Adding UNDEFs
on the temporary registers used for finding the live channels helps
reduce the liveness of those temporary registers, especially in loops.

Some titles affected :

Rise Of The Tomb Raider:
Totals from 2780 (22.58% of 12311) affected shaders:
Instrs: 1294455 -> 1294592 (+0.01%); split: -0.15%, +0.16%
Cycles: 1473136441 -> 1471302617 (-0.12%); split: -1.52%, +1.40%
Max live registers: 144282 -> 143595 (-0.48%)
Max dispatch width: 22200 -> 22232 (+0.14%)

Red Dead Redemption 2:
Totals from 435 (7.28% of 5972) affected shaders:
Instrs: 488472 -> 487594 (-0.18%); split: -0.31%, +0.14%
Cycles: 11354732 -> 11384928 (+0.27%); split: -0.44%, +0.71%
Spill count: 1217 -> 1172 (-3.70%)
Fill count: 3521 -> 3447 (-2.10%)
Scratch Memory Size: 64512 -> 62464 (-3.17%)
Max live registers: 35997 -> 35798 (-0.55%)

Fallout 4:
Totals from 8 (0.49% of 1638) affected shaders:
Instrs: 41908 -> 40509 (-3.34%)
Cycles: 3638464 -> 3555680 (-2.28%); split: -2.67%, +0.39%
Spill count: 717 -> 665 (-7.25%)
Fill count: 2542 -> 2438 (-4.09%)
Scratch Memory Size: 32768 -> 16384 (-50.00%)
Max live registers: 567 -> 534 (-5.82%)

Cyberpunk 2077:
Totals from 2984 (28.97% of 10301) affected shaders:
Instrs: 3888874 -> 3891600 (+0.07%); split: -0.20%, +0.27%
Cycles: 67906489 -> 67767721 (-0.20%); split: -0.68%, +0.47%
Spill count: 200 -> 98 (-51.00%)
Fill count: 237 -> 90 (-62.03%)
Scratch Memory Size: 10240 -> 8192 (-20.00%)
Max live registers: 215715 -> 212727 (-1.39%)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24282>
2023-07-26 08:48:33 +00:00
Lionel Landwerlin
5c72724819 intel/fs: consider UNDEF as non-partial write
A few titles show max live register reductions, but nothing
significant in instruction count or other stats.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24282>
2023-07-26 08:48:32 +00:00
Ian Romanick
cb0de0a1d3 intel/fs: Constant fold OR and AND
The path taken in fs_visitor::swizzle_nir_scratch_addr for DG2 generates
some AND and OR instructions before the SHL. This commit folds those so
the whold calculation becomes a constant (like on older platforms).

v2: Fix return type of src_as_uint. Noticed by Marcin.

shader-db results:

DG2
total instructions in shared programs: 23190475 -> 23179540 (-0.05%)
instructions in affected programs: 36026 -> 25091 (-30.35%)
helped: 7 / HURT: 0

total cycles in shared programs: 841196807 -> 841142563 (<.01%)
cycles in affected programs: 1660670 -> 1606426 (-3.27%)
helped: 7 / HURT: 0

No shader-db changes on any older Intel platforms.

fossil-db results:

DG2
Totals:
Instrs: 197780372 -> 197773966 (-0.00%)
Cycles: 14066410782 -> 14066399378 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 8438104 -> 8438112 (+0.00%)
Send messages: 8049445 -> 8049446 (+0.00%)
Scratch Memory Size: 14263296 -> 14264320 (+0.01%)

Totals from 9 (0.00% of 668055) affected shaders:
Instrs: 24547 -> 18141 (-26.10%)
Cycles: 1984791 -> 1973387 (-0.57%); split: -0.98%, +0.40%
Subgroup size: 88 -> 96 (+9.09%)
Send messages: 867 -> 868 (+0.12%)
Scratch Memory Size: 69632 -> 70656 (+1.47%)

No fossil-db changes on any older Intel platforms.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23884>
2023-07-25 22:11:21 +00:00