Commit graph

80 commits

Author SHA1 Message Date
Caio Oliveira
4f246cf4e7 intel/compiler: Merge child/latency arrays in schedule_node
Values are used together, saves one pointer in schedule_node,
reduces amount of reallocations when children count grows.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
e59a054203 intel/compiler: Move FS specific fields to fs_instruction_scheduler
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
a6297d05ca intel/compiler: Remove virtual calls from scheduler
Pull run() and schedule_instructions() for fs, and pull a very
simplified version of those into a run() for vec4.  Because of the
previous patches the duplication is small.

Since we are touching these, change run() implementations to use the
cfg from the existing reference to the visitor/shader instead of taking
one as argument.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
d76d58cf50 intel/compiler: Cache issue_time information
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
ecd7ffcf78 intel/compiler: Extract scheduling related basic functions
Those will be used in multiple places later.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
8a8dd2db0c intel/compiler: Add only available instructions to scheduling list
The list was used for iterating through all instructions and then
later also to track the available ones.  Now that the array iteration
is used, change how we fill it and rename it to reflect its only job.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
ddff6428c5 intel/compiler: Use array to iterate the scheduler nodes
For all the preparation data collection before the scheduling
actually happens, it is possible to walk the schedule nodes
in order by iterating on the range of the array dedicated to
a given block.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
fe6ac5a184 intel/compiler: Allocate all schedule_nodes at once
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
be012055da intel/compiler: Remove reference to brw_isa_info from schedule_node
It is always the same for all nodes, so use the one available in the
scheduler itself.

Also, per Matt's suggestion, collect is_haswell from devinfo instead of
from a function argument.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
6987571737 intel/compiler: Use linear allocator in parts of brw_schedule_instructions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Francisco Jerez
80e9031b44 intel/fs/xe2+: Fix grf_count in post-RA scheduling for updated register file size.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Kenneth Graunke
7eba19245d intel/compiler: Move SCHEDULE_NONE handling into schedule_instructions()
I'm going to introduce another call site for this function, and just
handling SCHEDULE_NONE in the scheduler itself makes more sense than
duplicating the logic.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
2023-08-23 21:34:38 +00:00
Emma Anholt
10b94772d2 intel: Reduce cost of resetting last_grf_write.
In zink-on-anv fs-mod-dvec3-dvec3.shader_test, we were memsetting 2MB of
last_grf_write 2400 times, multiple times through the scheduler.  Just
resetting for the processed instructions reduces runtime from 21s to 16s.
No change on steam shader-db runtime across several runs.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>
2023-06-14 16:16:56 +00:00
Emma Anholt
7d4769e802 intel: Allocate the last_grf_write once per scheduler.
No need to re-calloc it per block when we're going to use it again.  Also,
this fixes the vec4 backend to avoid allocating giant grf_count-sized
arrays on the stack.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>
2023-06-14 16:16:56 +00:00
Emma Anholt
2ad865b219 intel: Count reads_remaining across all blocks.
We were zeroing it out per block, but it doesn't actually help to count
per block, since the question is "will scheduling this instruction free
the reg?".  Saves some memsetting, which was showing up high in the
profile (but not from this source).

No change on iris SKL shader-db.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>
2023-06-14 16:16:55 +00:00
Lionel Landwerlin
a66944dfbc intel/fs: reuse descriptor helper
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:36 +00:00
Lionel Landwerlin
9471ffa70a intel/fs: fix scheduling of HALT instructions
With the following test :

dEQP-VK.spirv_assembly.instruction.terminate_invocation.terminate.no_out_of_bounds_load

There is a :

shader_start:
   ...                                 <- no control flow
   g0 = some_alu
   g1 = fbl
   g2 = broadcast g3, g1
   g4 = get_buffer_size g2
   ...                                 <- no control flow
   halt                                <- on some lanes
   g5 = send <surface>, g4

eliminate_find_live_channel will remove the fbl/broadcast because it
assumes lane0 is active at get_buffer_size :

shader_start:
   ...                                 <- no control flow
   g0 = some_alu
   g4 = get_buffer_size g0
   ...                                 <- no control flow
   halt                                <- on some lanes
   g5 = send <surface>, g4

But then the instruction scheduler will move the get_buffer_size after
the halt :

shader_start:
   ...                                 <- no control flow
   halt                                <- on some lanes
   g0 = some_alu
   g4 = get_buffer_size g0
   g5 = send <surface>, g4

get_buffer_size pulls the surface index from lane0 in g0 which could
have been turned off by the halt and we end up accessing an invalid
surface handle.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20765>
2023-05-05 00:43:25 +03:00
Jason Ekstrand
714a291673 intel/compiler: Use SHADER_OPCODE_SEND for PI messages
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>
2023-02-06 09:12:17 +00:00
Lionel Landwerlin
13cca48920 intel/fs: drop FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GFX7
We can lower FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD into other more
generic sends and drop this internal opcode.

The idea behind this change is to allow bindless surfaces to be used
for UBO pulls and why it's interesting to be able to reuse
setup_surface_descriptors(). But that will come in a later change.

No shader-db changes on TGL & DG2.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20416>
2023-01-26 11:26:53 +00:00
Ian Romanick
bdc7668008 intel/fs: Lower URB messages to SEND
Before rebasing on top of Ken's split-SEND optimization (see !17018),
this commit just caused some scheduling changes in various tessellation
and geometry shaders.  These changes were caused by the addition of real
latency information for the URB messages.

With the addition of the split-SEND optimization, the changes
are... staggering.  All of the shaders helped for spills and fills are
vertex shaders from Batman Arkham Origins.  What surprises me is that
these shaders account for such a high percentage of the spills and fills
in fossil-db.  85%?!?

v2: Use FIXED_GRF instead of BRW_GENERAL_REGISTER_FILE in an assertion.
Suggested by Ken.

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20013625 -> 19954020 (-0.30%)
instructions in affected programs: 4007157 -> 3947552 (-1.49%)
helped: 31161
HURT: 0
helped stats (abs) min: 1 max: 400 x̄: 1.91 x̃: 2
helped stats (rel) min: 0.08% max: 59.70% x̄: 2.20% x̃: 1.83%
95% mean confidence interval for instructions value: -1.97 -1.86
95% mean confidence interval for instructions %-change: -2.22% -2.18%
Instructions are helped.

total cycles in shared programs: 859337569 -> 858636788 (-0.08%)
cycles in affected programs: 74168298 -> 73467517 (-0.94%)
helped: 13812
HURT: 16846
helped stats (abs) min: 1 max: 291078 x̄: 82.83 x̃: 4
helped stats (rel) min: <.01% max: 37.09% x̄: 3.47% x̃: 2.02%
HURT stats (abs)   min: 1 max: 1543 x̄: 26.31 x̃: 14
HURT stats (rel)   min: <.01% max: 77.97% x̄: 4.11% x̃: 2.58%
95% mean confidence interval for cycles value: -55.10 9.39
95% mean confidence interval for cycles %-change: 0.62% 0.77%
Inconclusive result (value mean confidence interval includes 0).

Broadwell
total cycles in shared programs: 904844939 -> 904832320 (<.01%)
cycles in affected programs: 525360 -> 512741 (-2.40%)
helped: 215
HURT: 4
helped stats (abs) min: 4 max: 1018 x̄: 60.16 x̃: 39
helped stats (rel) min: 0.14% max: 15.85% x̄: 2.16% x̃: 2.04%
HURT stats (abs)   min: 79 max: 79 x̄: 79.00 x̃: 79
HURT stats (rel)   min: 1.31% max: 1.57% x̄: 1.43% x̃: 1.43%
95% mean confidence interval for cycles value: -75.02 -40.22
95% mean confidence interval for cycles %-change: -2.37% -1.81%
Cycles are helped.

No shader-db changes on any older Intel platforms.

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 142622800 -> 141461114 (-0.8%)
Instructions helped: 197186

Cycles in all programs: 9101223846 -> 9099440025 (-0.0%)
Cycles helped: 37963
Cycles hurt: 151233

Spills in all programs: 98829 -> 13695 (-86.1%)
Spills helped: 2159

Fills in all programs: 128142 -> 18400 (-85.6%)
Fills helped: 2159

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
2022-07-08 19:45:34 +00:00
Kenneth Graunke
72e9843991 intel/compiler: Introduce a new brw_isa_info structure
This structure will contain the opcode mapping tables in the next
commit.  For now, this is the mechanical change to plumb it into all
the necessary places, and it continues simply holding devinfo.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
2022-06-30 23:46:35 +00:00
Lionel Landwerlin
361b3fee3c intel: move away from booleans to identify platforms
v2: Drop changes around GFX_VERx10 == 75 (Luis)

v3: Replace
   (GFX_VERx10 < 75 && devinfo->platform != INTEL_PLATFORM_BYT)
   by
   (devinfo->platform == INTEL_PLATFORM_IVB)
   Replace
   (devinfo->ver >= 5 || devinfo->platform == INTEL_PLATFORM_G4X)
   by
   (devinfo->verx10 >= 45)
   Replace
   (devinfo->platform != INTEL_PLATFORM_G4X)
   by
   (devinfo->verx10 != 45)

v4: Fix crocus typo

v5: Rebase

v6: Add GFX3, ILK & I965 platforms (Jordan)
    Move ifdef to code expressions (Jordan)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12981>
2021-11-08 16:48:06 +00:00
Dave Airlie
8a81d14271 intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5
This is the equivalent of idr's
intel/fs: sel.cond writes the flags on Gfx4 and Gfx5

except for the vec4 backend.

This fixes buggy rendering seen with crocus on a qt trace.

v2 (idr): Trivial whitespace change.  Add unit tests.

v3: Fix type in comment in unit tests.  Noticed by Jason and Priit.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

Iron Lake
total instructions in shared programs: 8183077 -> 8184543 (0.02%)
instructions in affected programs: 198990 -> 200456 (0.74%)
helped: 0
HURT: 1355
HURT stats (abs)   min: 1 max: 8 x̄: 1.08 x̃: 1
HURT stats (rel)   min: 0.29% max: 6.00% x̄: 0.99% x̃: 0.70%
95% mean confidence interval for instructions value: 1.04 1.12
95% mean confidence interval for instructions %-change: 0.96% 1.03%
Instructions are HURT.

total cycles in shared programs: 238967672 -> 238962784 (<.01%)
cycles in affected programs: 4666014 -> 4661126 (-0.10%)
helped: 406
HURT: 314
helped stats (abs) min: 4 max: 54 x̄: 22.46 x̃: 18
helped stats (rel) min: <.01% max: 12.80% x̄: 1.82% x̃: 0.65%
HURT stats (abs)   min: 2 max: 112 x̄: 13.48 x̃: 12
HURT stats (rel)   min: <.01% max: 7.82% x̄: 0.81% x̃: 0.16%
95% mean confidence interval for cycles value: -8.60 -4.98
95% mean confidence interval for cycles %-change: -0.87% -0.49%
Cycles are helped.

GM45
total instructions in shared programs: 4986888 -> 4988354 (0.03%)
instructions in affected programs: 198990 -> 200456 (0.74%)
helped: 0
HURT: 1355
HURT stats (abs)   min: 1 max: 8 x̄: 1.08 x̃: 1
HURT stats (rel)   min: 0.29% max: 6.00% x̄: 0.99% x̃: 0.70%
95% mean confidence interval for instructions value: 1.04 1.12
95% mean confidence interval for instructions %-change: 0.96% 1.03%
Instructions are HURT.

total cycles in shared programs: 153577826 -> 153572938 (<.01%)
cycles in affected programs: 4666014 -> 4661126 (-0.10%)
helped: 406
HURT: 314
helped stats (abs) min: 4 max: 54 x̄: 22.46 x̃: 18
helped stats (rel) min: <.01% max: 12.80% x̄: 1.82% x̃: 0.65%
HURT stats (abs)   min: 2 max: 112 x̄: 13.48 x̃: 12
HURT stats (rel)   min: <.01% max: 7.82% x̄: 0.81% x̃: 0.16%
95% mean confidence interval for cycles value: -8.60 -4.98
95% mean confidence interval for cycles %-change: -0.87% -0.49%
Cycles are helped.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12191>
2021-08-11 13:09:32 -07:00
Ian Romanick
38807ceeae intel/fs: sel.cond writes the flags on Gfx4 and Gfx5
On Gfx4 and Gfx5, sel.l (for min) and sel.ge (for max) are implemented
using a separte cmpn and sel instruction.  This lowering occurs in
fs_vistor::lower_minmax which is called very, very late... a long, long
time after the first calls to opt_cmod_propagation.  As a result,
conditional modifiers can be incorrectly propagated across sel.cond on
those platforms.

No tests were affected by this change, and I find that quite shocking.
After just changing flags_written(), all of the atan tests started
failing on ILK.  That required the change in cmod_propagatin (and the
addition of the prop_across_into_sel_gfx5 unit test).

Shader-db results for ILK and GM45 are below.  I looked at a couple
before and after shaders... and every case that I looked at had
experienced incorrect cmod propagation.  This affected a LOT of apps!
Euro Truck Simulator 2, The Talos Principle, Serious Sam 3, Sanctum 2,
Gang Beasts, and on and on... :(

I discovered this bug while working on a couple new optimization
passes.  One of the passes attempts to remove condition modifiers that
are never used.  The pass made no progress except on ILK and GM45.
After investigating a couple of the affected shaders, I noticed that
the code in those shaders looked wrong... investigation led to this
cause.

v2: Trivial changes in the unit tests.

v3: Fix type in comment in unit tests.  Noticed by Jason and Priit.

v4: Tweak handling of BRW_OPCODE_SEL special case.  Suggested by Jason.

Fixes: df1aec763e ("i965/fs: Define methods to calculate the flag subset read or written by an fs_inst.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Dave Airlie <airlied@redhat.com>

Iron Lake
total instructions in shared programs: 8180493 -> 8181781 (0.02%)
instructions in affected programs: 541796 -> 543084 (0.24%)
helped: 28
HURT: 1158
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.35% max: 0.86% x̄: 0.53% x̃: 0.50%
HURT stats (abs)   min: 1 max: 3 x̄: 1.14 x̃: 1
HURT stats (rel)   min: 0.12% max: 4.00% x̄: 0.37% x̃: 0.23%
95% mean confidence interval for instructions value: 1.06 1.11
95% mean confidence interval for instructions %-change: 0.31% 0.38%
Instructions are HURT.

total cycles in shared programs: 239420470 -> 239421690 (<.01%)
cycles in affected programs: 2925992 -> 2927212 (0.04%)
helped: 49
HURT: 157
helped stats (abs) min: 2 max: 284 x̄: 62.69 x̃: 70
helped stats (rel) min: 0.04% max: 6.20% x̄: 1.68% x̃: 1.96%
HURT stats (abs)   min: 2 max: 48 x̄: 27.34 x̃: 24
HURT stats (rel)   min: 0.02% max: 2.91% x̄: 0.31% x̃: 0.20%
95% mean confidence interval for cycles value: -0.80 12.64
95% mean confidence interval for cycles %-change: -0.31% <.01%
Inconclusive result (value mean confidence interval includes 0).

GM45
total instructions in shared programs: 4985517 -> 4986207 (0.01%)
instructions in affected programs: 306935 -> 307625 (0.22%)
helped: 14
HURT: 625
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.35% max: 0.82% x̄: 0.52% x̃: 0.49%
HURT stats (abs)   min: 1 max: 3 x̄: 1.13 x̃: 1
HURT stats (rel)   min: 0.12% max: 3.90% x̄: 0.34% x̃: 0.22%
95% mean confidence interval for instructions value: 1.04 1.12
95% mean confidence interval for instructions %-change: 0.29% 0.36%
Instructions are HURT.

total cycles in shared programs: 153827268 -> 153828052 (<.01%)
cycles in affected programs: 1669290 -> 1670074 (0.05%)
helped: 24
HURT: 84
helped stats (abs) min: 2 max: 232 x̄: 64.33 x̃: 67
helped stats (rel) min: 0.04% max: 4.62% x̄: 1.60% x̃: 1.94%
HURT stats (abs)   min: 2 max: 48 x̄: 27.71 x̃: 24
HURT stats (rel)   min: 0.02% max: 2.66% x̄: 0.34% x̃: 0.14%
95% mean confidence interval for cycles value: -1.94 16.46
95% mean confidence interval for cycles %-change: -0.29% 0.11%
Inconclusive result (value mean confidence interval includes 0).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12191>
2021-08-11 13:09:20 -07:00
Lionel Landwerlin
91dcbf1f56 intel/compiler: Track latency/perf of LSC fences
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11759>
2021-07-12 11:39:03 +00:00
Sagar Ghuge
621cf9b1df intel/fs: Lower Byte scattered r/w messages to LSC when available
v2 (Jason Ekstrand):
 - Squash in brw_scheduler changes
 - Update brw_ir_performance

Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
2021-06-30 16:17:18 +00:00
Sagar Ghuge
8f82c8aa1a intel/fs: Lower untyped float atomic messages to LSC when available
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
2021-06-30 16:17:18 +00:00
Mark Janes
bd40a1e8c9 intel/fs: Lower untyped atomic messages to LSC when available
Bspec programming note metions that "Atomic messages are always forced
to "un-cacheable" in the L1 cache". We can make the L1 cache
un-cacheable and L3 with write-back policy.

v2: (Sagar Ghuge):
 - Fix caching policy for atomic messages
 - Fix simd exec size

v3: (Sagar Ghuge):
 - Add atomic messages to brw_schedule_instructions

v4: (Jason Ekstrand):
 - Rebase on lsc_msg_desc reworks

Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com>
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
2021-06-30 16:17:18 +00:00
Mark Janes
4f86a70599 intel/fs: Lower DW untyped r/w messages to LSC when available
This puts the basic infrastructure in place for lowering logical
dataport messages to LSC messages.  We start with the two most obvious
opcodes and add more in later patches.

v2 (Sagar Ghuge):
 - Pass required params to message desc
 - Remove duplicate mlen calculation
 - Change commit message.

v3 (Jason Ekstrand):
 - Drop TGM support

Co-authored-by: Jason Ekstrand <mark.a.janes@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
2021-06-30 16:17:18 +00:00
Mark Janes
32ec0662fd intel/compiler: Add LSC messages to brw_schedule_instructions
v2 (Jason Ekstrand):
 - Use lsc_msg_desc_opcode()
 - Drop all opcodes for now and add them in later patches.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
2021-06-30 16:17:18 +00:00
Lionel Landwerlin
d665c2dcf0 intel/compiler: use existing helpers to pull bits of descriptors
v2: Use new RT descriptor helper

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
2021-05-02 20:20:06 +00:00
Jason Ekstrand
134af5ada2 intel/compiler: Don't insert barriers for NULL sources
Normally, we never see NULL in a source.  However, starting with
eab1c55590, we can with a SHADER_OPCODE_SEND if it only has the first
payload.  We were inserting barriers which adds unnecessary scheduling
dependencies and takes a lot of compile time because inserting a single
barrier is an O(n) operation.

All the extra O(n) can have a surprisingly large effect.  This cuts the
runtime of dEQP-VK.binding_model.buffer_device_address.set3.depth3.
basessbo.convertcheckuv2.store.single.std140.frag by a factor of 20x for
a debug build.

Shader-db results on ICL:

    total instructions in shared programs: 19918983 -> 19921610 (0.01%)
    instructions in affected programs: 884074 -> 886701 (0.30%)
    helped: 1688
    HURT: 817
    helped stats (abs) min: 1 max: 163 x̄: 4.23 x̃: 1
    helped stats (rel) min: 0.02% max: 12.50% x̄: 1.08% x̃: 0.61%
    HURT stats (abs)   min: 1 max: 2674 x̄: 11.95 x̃: 2
    HURT stats (rel)   min: 0.11% max: 70.22% x̄: 1.71% x̃: 1.03%
    95% mean confidence interval for instructions value: -1.97 4.06
    95% mean confidence interval for instructions %-change: -0.28% -0.06%
    Inconclusive result (value mean confidence interval includes 0).

    total cycles in shared programs: 976503324 -> 975884809 (-0.06%)
    cycles in affected programs: 82581703 -> 81963188 (-0.75%)
    helped: 4144
    HURT: 5010
    helped stats (abs) min: 1 max: 79294 x̄: 311.31 x̃: 8
    helped stats (rel) min: <.01% max: 53.69% x̄: 2.00% x̃: 0.51%
    HURT stats (abs)   min: 1 max: 92266 x̄: 134.04 x̃: 8
    HURT stats (rel)   min: <.01% max: 218.09% x̄: 3.25% x̃: 0.53%
    95% mean confidence interval for cycles value: -119.85 -15.29
    95% mean confidence interval for cycles %-change: 0.68% 1.07%
    Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

    total spills in shared programs: 10659 -> 12014 (12.71%)
    spills in affected programs: 441 -> 1796 (307.26%)
    helped: 7
    HURT: 12

    total fills in shared programs: 11551 -> 14429 (24.92%)
    fills in affected programs: 993 -> 3871 (289.83%)
    helped: 8
    HURT: 11

    total sends in shared programs: 1025832 -> 1025353 (-0.05%)
    sends in affected programs: 2241 -> 1762 (-21.37%)
    helped: 105
    HURT: 1
    helped stats (abs) min: 1 max: 87 x̄: 4.57 x̃: 2
    helped stats (rel) min: 5.56% max: 54.72% x̄: 11.37% x̃: 10.00%
    HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
    HURT stats (rel)   min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
    95% mean confidence interval for sends value: -7.39 -1.65
    95% mean confidence interval for sends %-change: -12.95% -7.70%
    Sends are helped.

    LOST:   93
    GAINED: 109

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4648
Fixes: eab1c55590 "intel/fs: Support SENDS in SHADER_OPCODE_SEND"
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10412>
2021-04-22 18:00:16 +00:00
Anuj Phogat
61e8636557 intel: Rename gen_device prefix to intel_device
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "gen_device" -rIl $SEARCH_PATH | xargs sed -ie "s/gen_device/intel_device/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10241>
2021-04-20 20:06:33 +00:00
Anuj Phogat
e7e55af4d6 intel: Rename GENx keyword to GFXx
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "GEN[[:digit:]]+" -rIl $SEARCH_PATH | xargs sed -ie "s/GEN\([[:digit:]]\+\)/GFX\1/g"

Exclude the changes to modifiers:
grep -E "I915_.*GFX" -rIl $SEARCH_PATH | xargs sed -ie "s/\(I915_.*\)GFX/\1GEN/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat
1d296484b4 intel: Rename Genx keyword to Gfxx
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "Gen[[:digit:]]+" -rIl $SEARCH_PATH | xargs sed -ie "s/Gen\([[:digit:]]\+\)/Gfx\1/g"

Exclude changes in src/intel/perf/oa-*.xml:
find src/intel/perf -type f \( -name "*.xml" \) | xargs sed -ie "s/Gfx/Gen/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat
b75f095bc7 intel: Rename genx keyword to gfxx in source files
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "gen[[:digit:]]+" -rIl $SEARCH_PATH | xargs sed -ie "s/gen\([[:digit:]]\+\)/gfx\1/g"

Exclude pack.h and xml changes in this patch:
grep -E "gfx[[:digit:]]+_pack\.h" -rIl $SEARCH_PATH | xargs sed -ie "s/gfx\([[:digit:]]\+_pack\.h\)/gen\1/g"
grep -E "gfx[[:digit:]]+\.xml" -rIl $SEARCH_PATH | xargs sed -ie "s/gfx\([[:digit:]]\+\.xml\)/gen\1/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat
c1f3a778de intel: Rename GENx prefix in macros to GFXx in source files
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "GEN" -rIl src/intel/genxml | grep -E ".*py" |  xargs sed -ie "s/GEN\([%{]\)/GFX\1/g"
grep -E "[^_]GEN[[:digit:]]+" -rIl $SEARCH_PATH | grep -E ".*(\.c|\.h|\.y|\.l)" | xargs sed -ie "s/\([^_]\)GEN\([[:digit:]]\+\)/\1GFX\2/g"

Leave out renaming GFX12_CCS_E macros. They fall under renaming pattern like "_GEN[[:digit:]]+":
grep -E "GFX12_CCS_E" -rIl $SEARCH_PATH | xargs sed -ie "s/GFX12_CCS_E/GEN12_CCS_E/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat
abe9a71a09 intel: Rename gen field in gen_device_info struct to ver
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "info\)*(.|->)gen" -rIl $SEARCH_PATH | xargs sed -ie "s/info\()*\)\(\.\|->\)gen/info\1\2ver/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Jason Ekstrand
91192696e6 intel/fs: Add support for 16-bit A64 float and integer atomics
The messages for those 16-bit operations still use 32-bit sources and
destinations, so expand them accordingly when building the payload.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8750>
2021-03-18 00:13:40 +00:00
Lionel Landwerlin
8b6d22109f intel/fs/vec4: add missing dependency in write-on-write fixed GRFs
If we load constant data using pull constant SENDS, and we later load
that register with some other data, we can end up in a situation where
we don't track the initial fixed register write and therefore end up
using uninitialized registers.

This tracks write-on-write of fixed GRFs like we do for normal virtual
GRFs.

v2: Fix post_alloc_reg case (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9667>
2021-03-17 23:25:02 +00:00
Marcin Ślusarz
97c3ec6116 intel/compiler: cache computed register pressure benefit
This halves the number of calls to get_register_pressure_benefit
and decreases shader-db CPU time by ~1.5%.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8741>
2021-01-29 11:31:39 +00:00
Jason Ekstrand
f9d549b2bf intel/fs: Use BRW_OPCODE_HALT for discards
We're about to start using it to implement nir_jump_halt which has
nothing inherently to do with fragment shaders or discards.  May as well
name it for the HW instruction it generates.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>
2020-12-01 16:19:08 -06:00
Jason Ekstrand
e76e359007 intel/fs: Rename PLACEHOLDER_HALT to HALT_TARGET
It's a bit more explicit and will play more nicely with what we're about
to do.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>
2020-12-01 16:18:50 -06:00
Jason Ekstrand
75209d5bd1 intel/fs: Add and implement intel-specific ray-tracing intrinsics
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
2020-11-25 05:37:10 +00:00
Jason Ekstrand
7280b0911d intel/compiler: Add support for bindless shaders
The Intel bindless thread dispatch model is very simple.  When a compute
shader is to be used for bindless dispatch, it can request a set of
stack IDs.  These are allocated per-dual-subslice by the hardware and
recycled automatically when the stack ID is returned.  Passed to the
bindless dispatch are a global argument address, a stack ID, and an
address of the BINDLESS_SHADER_RECORD to invoke.  When the bindless
shader is dispatched, it is passed its stack ID as well as the global
and local argument pointers.  The local argument pointer is the address
of the BINDLESS_SHADER_RECORD plus some offset which is specified as
part of the BINDLESS_SHADER_RECORD.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
2020-11-25 05:37:09 +00:00
Caio Marcelo de Oliveira Filho
d372abe397 intel/fs: Add surface OWORD BLOCK opcodes
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7448>
2020-11-04 20:24:48 +00:00
Caio Marcelo de Oliveira Filho
d3d2b73fa3 intel/fs: Add A64 OWORD BLOCK opcodes
Based on a patch for OWORD BLOCK READ from Jason Ekstrand.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7448>
2020-11-04 20:24:48 +00:00
Jason Ekstrand
5abac85177 intel/fs: Rework scratch handling on Gen9+
The current scratch mechanism uses an MRF hack where we reserve a few
GRF registers to treat like the MRF and we collect the data into that
MRF region before doing a scratch write.  We also use that region for
the header for scratch reads.

This commit changes things and gets rid of the MRF hack.  Instead, we
reserve a single register (which RA is free to pick) for the scratch
header and uses split sends for scratch writes to avoid having to do
the copy.  This should provide RA with more freedom in the presence of
spilling as well as avoid some unnecessary data moves.  In future, the
new GEN9_SCRATCH_HEADER opcode gives us a place where we can do our own
per-thread scratch base address calculations rather than depending on
the scratch base address that gets pushed into g0.  Having an opcode for
this lets us do it once at the top of the shader rather than repeating
it at every read/write.

One other noticeable difference is the use of SHADER_OPCODE_SEND.  We
can get away with this thanks to the fact that we're now using a set to
track which instructions are generated by spills and don't rely on the
opcodes to find spill/fill instructions.  This allows us to avoid adding
more virtual opcodes and let the normal code paths handle things like
scoreboard dependencies between header setup and the SEND.  It also
means that post-RA scheduling may be able to space out the header setup
MOV and the SEND for better latency hiding.

Shader-db results on Skylake:

    total spills in shared programs: 12137 -> 10604 (-12.63%)
    spills in affected programs: 6685 -> 5152 (-22.93%)
    helped: 274
    HURT: 2

    total fills in shared programs: 13065 -> 11515 (-11.86%)
    fills in affected programs: 9007 -> 7457 (-17.21%)
    helped: 275
    HURT: 1

Shader-db results on Ice Lake:

    total spills in shared programs: 12482 -> 10953 (-12.25%)
    spills in affected programs: 6586 -> 5057 (-23.22%)
    helped: 275
    HURT: 0

    total fills in shared programs: 12819 -> 11234 (-12.36%)
    fills in affected programs: 7867 -> 6282 (-20.15%)
    helped: 274
    HURT: 0

Shader-db results on Tigerlake:

    total spills in shared programs: 11689 -> 10233 (-12.46%)
    spills in affected programs: 4740 -> 3284 (-30.72%)
    helped: 259
    HURT: 0

    total fills in shared programs: 10840 -> 9443 (-12.89%)
    fills in affected programs: 6244 -> 4847 (-22.37%)
    helped: 259
    HURT: 0

Fossil-db results on Ice Lake:

    Spills in all programs: 245249 -> 201633 (-17.8%)
    Fills in all programs: 366066 -> 314368 (-14.1%)

More practically, this seems to give about a 0.5-1% perf boost in
Witcher 3 (DXVK) and Shadow of the Tomb Raider (Vulkan native).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
2020-10-13 21:59:27 +00:00
Marcin Ślusarz
5ea0b6a9c6 intel/compiler: initialize remaining fields of various classes
These variables seem to be initialized before being used, so this
patch is not fixing any bug, but leaving them unitialized may become
a bug after some refactoring.

These classes were affected: fs_reg_alloc, fs_visitor, fs_generator,
instruction_scheduler.

Found by Coverity.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6667>
2020-09-10 12:16:58 +00:00
Francisco Jerez
5e2a7e11b4 intel/ir: Remove scheduling-based cycle count estimates.
The cycle count estimation logic part of the scheduler is now
redundant with the shader performance modeling pass, and the estimates
can be consolidated into the brw::performance analysis result object
instead of being part of the CFG, which guarantees that the estimates
cannot be accessed without previously calling the
performance_analysis::require() method, which makes sure that the
right analysis pass is executed at the right time if we don't already
have up-to-date cached results.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-04-28 23:01:27 -07:00