Commit graph

68 commits

Author SHA1 Message Date
Lionel Landwerlin
d8b78924c5 brw: use a single virtual opcode to read ARF registers
In 2c65d90bc8 I forgot to add the new SHADER_OPCODE_READ_MASK_REG
opcode to the list of barrier instruction in the scheduler. Let's just
use a single opcode for all ARF registers that need special
scoreboarding and put the register as source (nicer for the debug
output).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2c65d90bc8 ("intel/brw: ensure find_live_channel don't access arch register without sync")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>
2024-05-31 20:22:27 +00:00
Lionel Landwerlin
2c65d90bc8 intel/brw: ensure find_live_channel don't access arch register without sync
Another architecture register that requires some care before reading.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 49ee3ae9e8 ("intel/compiler: Lower FIND_[LAST_]LIVE_CHANNEL in IR on Gfx8+")
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29319>
2024-05-24 07:26:17 +00:00
Kenneth Graunke
545bb8fb6f intel/brw: Replace type_sz and brw_reg_type_to_size with brw_type_size_*
Both of these helpers do the same thing.  We now have brw_type_size_bits
and brw_type_size_bytes and can use whichever makes sense in that place.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
007d891239 intel/brw: Use newer brw_type_is_* shorter names
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
873fcdff38 intel/brw: Stop using long BRW_REGISTER_TYPE enum names
s/BRW_REGISTER_TYPE/BRW_TYPE/g

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
d5b8cec7a2 intel/brw: Replace FS_OPCODE_LINTERP with BRW_OPCODE_PLN
We no longer support the old LINE+MAC lowering, and we already lower
this to MAD in NIR on Gfx11+, so the LINTERP virtual opcode always
corresponds the PLN.  The only catch is that LINTERP's operands are
reversed from PLN, so we have to switch them.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
2024-04-16 02:14:49 +00:00
Kenneth Graunke
12b0e03bd2 intel/brw: Use SHADER_OPCODE_SEND for coherent framebuffer reads
We already have a logical opcode and lower to what is basically a send
instruction.  We just weren't using SHADER_OPCODE_SEND, instead having
extra redundant infrastructure for no real gain.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
2024-04-16 02:14:49 +00:00
Kenneth Graunke
97bf3d3b2d intel/brw: Replace CS_OPCODE_CS_TERMINATE with SHADER_OPCODE_SEND
There's no need for special handling here, it's just a send message
with a trivial g0 header and descriptor.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27924>
2024-03-05 11:16:20 +00:00
Kenneth Graunke
ad37622a8f intel/brw: Delete legacy texture opcodes
We first generate the logical opcodes, and these days fully lower to
SHADER_OPCODE_SEND.  In the past, we lowered to a non-logical variant
and handled that in the generator.  These days, we were just using the
non-logical opcodes as an awkward intermediate opcode change during
the lowering...which isn't really necessary at all.

This patch eliminates them by using the original logical opcodes.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27908>
2024-03-01 22:19:51 +00:00
Kenneth Graunke
45a5e4c0c4 intel/brw: Delete SHADER_OPCODE_TXF_UMS
Nothing seems to generate this anymore.  I guess we always use CMS.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27908>
2024-03-01 22:19:51 +00:00
Kenneth Graunke
601ef12467 intel/brw: Delete SHADER_OPCODE_TXF_CMS[_LOGICAL]
We always use the wide variant (_W) on hardware this compiler supports.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27908>
2024-03-01 22:19:50 +00:00
Caio Oliveira
fb1d871714 intel/brw: Fold backend_reg into fs_reg
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27904>
2024-03-01 17:52:09 +00:00
Caio Oliveira
db322554a7 intel/brw: Use fs_inst explicitly in various passes
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27866>
2024-02-29 20:47:48 -08:00
Caio Oliveira
559d94cd0d intel/brw: Use fs_visitor instead of backend_shader in various passes
And since we are touching them, rename a couple of passes
to follow same name convention as existing ones.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>
2024-02-29 19:28:05 +00:00
Caio Oliveira
8f3c52c1da intel/brw: Remove MRF type
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
5c93a0e125 intel/brw: Remove Gfx8- remaining opcodes
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
071e9f49f1 intel/brw: Remove F16TO32 and F32TO16 opcodes
These are done with MOVs and appropriate types in Gfx9+.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:38 +00:00
Caio Oliveira
91c05d990a intel/brw: Remove Gfx8- code from IR performance analysis
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:38 +00:00
Caio Oliveira
c621f75e7b intel/brw: Remove now unused vec4-only opcodes
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:38 +00:00
Caio Oliveira
a641aa294e intel/brw: Remove vec4 backend
It still exists as part of ELK for older gfx versions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:37 +00:00
Ian Romanick
8fb37ef985 intel/fs: Add fast path for ballot(true)
This doesn't help very much now. A later commit adds a NIR optimization
pass, tentatively called nir_opt_uniform_subgroup, that converts many
kinds of subgroup operations to things involving
bitCount(ballot(true)). This commit makes a huge difference in the
results of that later commit.

No shader-db changes on any Intel platform.

Fossil-db results:

All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165558033 -> 165557519 (-0.00%)
Cycles: 15156188362 -> 15156178922 (-0.00%); split: -0.00%, +0.00%

Totals from 299 (0.05% of 656117) affected shaders:
Instrs: 88293 -> 87779 (-0.58%)
Cycles: 3709498 -> 3700058 (-0.25%); split: -0.28%, +0.03%

v2: Rebase on splitting ELK from BRW. Remove devinfo->ver >= 8 check.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
2024-02-27 08:37:46 -08:00
Sagar Ghuge
6f0ab5e4d5 intel/compiler: Add texture gather offset LOD/Bias message support
v2: (Ian)
- Space formatting on conditional statement

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>
2024-02-27 00:22:46 +00:00
Sagar Ghuge
79af0ac29a intel/compiler: Add gather4_i/l/[_c]/b sampler message
v2: (Ian)
- Format comment

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>
2024-02-27 00:22:46 +00:00
Francisco Jerez
6efcba9e36 intel/ir/xe2+: Add support for 32 SBID tokens to performance model.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27165>
2024-01-20 19:55:31 +00:00
Ian Romanick
e666872c75 intel/compiler: Initial bits for DPAS instruction
v2: Add brw_ir_performance.cpp and brw_fs_generator.cpp changes. Fix
overlapping register allocation (via has_source_and_destination_hazard). Fix
incorrect destination register file encoding.

v3: Prevent lower_regioning from trying to "fix" DPAS sources.

v4: Add instruction latency information for scheduling and perf
estimates.

v5: Remove all mention of DPASW. Suggested by Curro and Caio. Update
the comment in fs_inst::has_source_and_destination_hazard. Suggested
by Caio.

v6: Add some comments near the src2 calculation in
fs_inst::size_read. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:16 -08:00
Francisco Jerez
23e14a6c27 intel/eu/xe2+: Add definition for size of GRF space on Xe2.
And use it in various places in the compiler that require knowledge
about the size of the register file.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>
2023-11-08 23:17:24 -08:00
Francisco Jerez
421d43fe62 intel/fs/xe2+: Fixes for increased accumulator register width.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Jason Ekstrand
714a291673 intel/compiler: Use SHADER_OPCODE_SEND for PI messages
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>
2023-02-06 09:12:17 +00:00
Lionel Landwerlin
13cca48920 intel/fs: drop FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GFX7
We can lower FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD into other more
generic sends and drop this internal opcode.

The idea behind this change is to allow bindless surfaces to be used
for UBO pulls and why it's interesting to be able to reuse
setup_surface_descriptors(). But that will come in a later change.

No shader-db changes on TGL & DG2.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20416>
2023-01-26 11:26:53 +00:00
Ian Romanick
bbcb881f46 intel/fs: Remove non-_LOGICAL URB messages
The _LOGICAL versions are lowered direct to SEND, so nothing can ever
generate these messages.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
2022-07-08 19:45:34 +00:00
Ian Romanick
bdc7668008 intel/fs: Lower URB messages to SEND
Before rebasing on top of Ken's split-SEND optimization (see !17018),
this commit just caused some scheduling changes in various tessellation
and geometry shaders.  These changes were caused by the addition of real
latency information for the URB messages.

With the addition of the split-SEND optimization, the changes
are... staggering.  All of the shaders helped for spills and fills are
vertex shaders from Batman Arkham Origins.  What surprises me is that
these shaders account for such a high percentage of the spills and fills
in fossil-db.  85%?!?

v2: Use FIXED_GRF instead of BRW_GENERAL_REGISTER_FILE in an assertion.
Suggested by Ken.

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20013625 -> 19954020 (-0.30%)
instructions in affected programs: 4007157 -> 3947552 (-1.49%)
helped: 31161
HURT: 0
helped stats (abs) min: 1 max: 400 x̄: 1.91 x̃: 2
helped stats (rel) min: 0.08% max: 59.70% x̄: 2.20% x̃: 1.83%
95% mean confidence interval for instructions value: -1.97 -1.86
95% mean confidence interval for instructions %-change: -2.22% -2.18%
Instructions are helped.

total cycles in shared programs: 859337569 -> 858636788 (-0.08%)
cycles in affected programs: 74168298 -> 73467517 (-0.94%)
helped: 13812
HURT: 16846
helped stats (abs) min: 1 max: 291078 x̄: 82.83 x̃: 4
helped stats (rel) min: <.01% max: 37.09% x̄: 3.47% x̃: 2.02%
HURT stats (abs)   min: 1 max: 1543 x̄: 26.31 x̃: 14
HURT stats (rel)   min: <.01% max: 77.97% x̄: 4.11% x̃: 2.58%
95% mean confidence interval for cycles value: -55.10 9.39
95% mean confidence interval for cycles %-change: 0.62% 0.77%
Inconclusive result (value mean confidence interval includes 0).

Broadwell
total cycles in shared programs: 904844939 -> 904832320 (<.01%)
cycles in affected programs: 525360 -> 512741 (-2.40%)
helped: 215
HURT: 4
helped stats (abs) min: 4 max: 1018 x̄: 60.16 x̃: 39
helped stats (rel) min: 0.14% max: 15.85% x̄: 2.16% x̃: 2.04%
HURT stats (abs)   min: 79 max: 79 x̄: 79.00 x̃: 79
HURT stats (rel)   min: 1.31% max: 1.57% x̄: 1.43% x̃: 1.43%
95% mean confidence interval for cycles value: -75.02 -40.22
95% mean confidence interval for cycles %-change: -2.37% -1.81%
Cycles are helped.

No shader-db changes on any older Intel platforms.

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 142622800 -> 141461114 (-0.8%)
Instructions helped: 197186

Cycles in all programs: 9101223846 -> 9099440025 (-0.0%)
Cycles helped: 37963
Cycles hurt: 151233

Spills in all programs: 98829 -> 13695 (-86.1%)
Spills helped: 2159

Fills in all programs: 128142 -> 18400 (-85.6%)
Fills helped: 2159

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
2022-07-08 19:45:34 +00:00
Ian Romanick
b909ac350f intel/compiler: Rename vec4 state URB opcodes to have VEC4_ prefix
An argument could be made that all stage-specific opcodes for vec4
stages should be prefixed with VEC4_ like the stage-agnostic opcodes.
I'll leave those additional sed jobs for another day.

    egrep -lr '(VS|GS|TCS)_OPCODE_URB_WRITE' src |\
    while read f; do
        sed --in-place 's/\(VS\|GS\|TCS\)_OPCODE_URB_WRITE/VEC4_\1_OPCODE_URB_WRITE/g' $f
    done

    egrep -lr 'T.S_OPCODE[_A-Z]*URB_OFFSETS' src |\
    while read f; do
        sed --in-place 's/\(T.S_OPCODE[_A-Z]*URB_OFFSETS\)/VEC4_\1/g' $f
    done

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
2022-07-08 19:45:34 +00:00
Kenneth Graunke
72e9843991 intel/compiler: Introduce a new brw_isa_info structure
This structure will contain the opcode mapping tables in the next
commit.  For now, this is the mechanical change to plumb it into all
the necessary places, and it continues simply holding devinfo.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
2022-06-30 23:46:35 +00:00
Kenneth Graunke
6fa66ac228 intel/compiler: Implement nir_intrinsic_last_invocation
We haven't exposed this intrinsic as it doesn't directly correspond to
anything in SPIR-V.  However, it's used internally by some NIR passes,
namely nir_opt_uniform_atomics().

We reuse most of the infrastructure in brw_find_live_channel, but with
LZD/ADD instead of FBL.  A new SHADER_OPCODE_FIND_LAST_LIVE_CHANNEL is
like SHADER_OPCODE_FIND_LIVE_CHANNEL but from the other side.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
2022-03-26 00:28:19 +00:00
Lionel Landwerlin
3dabe93257 intel/fs: rework dss_id opcode into generic opcode
We'll want different types of IDs based on topology. Let's make this
more flexible and also move the bit shifting code a layer above where
it's easier to do bitshifting operations, especially if you need to
stash things into temporary registers.

v2: Keep previous comment.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
2022-02-08 12:55:24 +00:00
Kenneth Graunke
7325179bcb intel/compiler: Use uppercase enum values in brw_ir_performance.cpp
This is by far the more common style in Mesa.  It also gives a cue that
e.g. num_dependency_ids is a fixed definition rather than some kind of
local variable maintaining a count.

While hre, we also rename the enums to have full prefixes to prepare for
a future where we use them in multiple files for future backend work.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14182>
2021-12-16 09:00:57 +00:00
Lionel Landwerlin
361b3fee3c intel: move away from booleans to identify platforms
v2: Drop changes around GFX_VERx10 == 75 (Luis)

v3: Replace
   (GFX_VERx10 < 75 && devinfo->platform != INTEL_PLATFORM_BYT)
   by
   (devinfo->platform == INTEL_PLATFORM_IVB)
   Replace
   (devinfo->ver >= 5 || devinfo->platform == INTEL_PLATFORM_G4X)
   by
   (devinfo->verx10 >= 45)
   Replace
   (devinfo->platform != INTEL_PLATFORM_G4X)
   by
   (devinfo->verx10 != 45)

v4: Fix crocus typo

v5: Rebase

v6: Add GFX3, ILK & I965 platforms (Jordan)
    Move ifdef to code expressions (Jordan)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12981>
2021-11-08 16:48:06 +00:00
Jason Ekstrand
e6a9501aa2 intel/fs: Add the URB fence message
When they re-arranged all the dataport stuff and added the LSC, doing
URB fencing through the dataport no longer makes sense.  Instead, there
is now a fence message on the URB shared function.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13092>
2021-09-29 20:52:54 +00:00
Ian Romanick
0f809dbf40 intel/compiler: Basic support for DP4A instruction
v2: Very significant rebase on changes to previous commits.
Specifically, brw_fs_nir.cpp changes were pretty much rewritten from
scratch after changing the NIR opcode names and types.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
2021-08-24 19:58:57 +00:00
Dave Airlie
8a81d14271 intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5
This is the equivalent of idr's
intel/fs: sel.cond writes the flags on Gfx4 and Gfx5

except for the vec4 backend.

This fixes buggy rendering seen with crocus on a qt trace.

v2 (idr): Trivial whitespace change.  Add unit tests.

v3: Fix type in comment in unit tests.  Noticed by Jason and Priit.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

Iron Lake
total instructions in shared programs: 8183077 -> 8184543 (0.02%)
instructions in affected programs: 198990 -> 200456 (0.74%)
helped: 0
HURT: 1355
HURT stats (abs)   min: 1 max: 8 x̄: 1.08 x̃: 1
HURT stats (rel)   min: 0.29% max: 6.00% x̄: 0.99% x̃: 0.70%
95% mean confidence interval for instructions value: 1.04 1.12
95% mean confidence interval for instructions %-change: 0.96% 1.03%
Instructions are HURT.

total cycles in shared programs: 238967672 -> 238962784 (<.01%)
cycles in affected programs: 4666014 -> 4661126 (-0.10%)
helped: 406
HURT: 314
helped stats (abs) min: 4 max: 54 x̄: 22.46 x̃: 18
helped stats (rel) min: <.01% max: 12.80% x̄: 1.82% x̃: 0.65%
HURT stats (abs)   min: 2 max: 112 x̄: 13.48 x̃: 12
HURT stats (rel)   min: <.01% max: 7.82% x̄: 0.81% x̃: 0.16%
95% mean confidence interval for cycles value: -8.60 -4.98
95% mean confidence interval for cycles %-change: -0.87% -0.49%
Cycles are helped.

GM45
total instructions in shared programs: 4986888 -> 4988354 (0.03%)
instructions in affected programs: 198990 -> 200456 (0.74%)
helped: 0
HURT: 1355
HURT stats (abs)   min: 1 max: 8 x̄: 1.08 x̃: 1
HURT stats (rel)   min: 0.29% max: 6.00% x̄: 0.99% x̃: 0.70%
95% mean confidence interval for instructions value: 1.04 1.12
95% mean confidence interval for instructions %-change: 0.96% 1.03%
Instructions are HURT.

total cycles in shared programs: 153577826 -> 153572938 (<.01%)
cycles in affected programs: 4666014 -> 4661126 (-0.10%)
helped: 406
HURT: 314
helped stats (abs) min: 4 max: 54 x̄: 22.46 x̃: 18
helped stats (rel) min: <.01% max: 12.80% x̄: 1.82% x̃: 0.65%
HURT stats (abs)   min: 2 max: 112 x̄: 13.48 x̃: 12
HURT stats (rel)   min: <.01% max: 7.82% x̄: 0.81% x̃: 0.16%
95% mean confidence interval for cycles value: -8.60 -4.98
95% mean confidence interval for cycles %-change: -0.87% -0.49%
Cycles are helped.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12191>
2021-08-11 13:09:32 -07:00
Ian Romanick
38807ceeae intel/fs: sel.cond writes the flags on Gfx4 and Gfx5
On Gfx4 and Gfx5, sel.l (for min) and sel.ge (for max) are implemented
using a separte cmpn and sel instruction.  This lowering occurs in
fs_vistor::lower_minmax which is called very, very late... a long, long
time after the first calls to opt_cmod_propagation.  As a result,
conditional modifiers can be incorrectly propagated across sel.cond on
those platforms.

No tests were affected by this change, and I find that quite shocking.
After just changing flags_written(), all of the atan tests started
failing on ILK.  That required the change in cmod_propagatin (and the
addition of the prop_across_into_sel_gfx5 unit test).

Shader-db results for ILK and GM45 are below.  I looked at a couple
before and after shaders... and every case that I looked at had
experienced incorrect cmod propagation.  This affected a LOT of apps!
Euro Truck Simulator 2, The Talos Principle, Serious Sam 3, Sanctum 2,
Gang Beasts, and on and on... :(

I discovered this bug while working on a couple new optimization
passes.  One of the passes attempts to remove condition modifiers that
are never used.  The pass made no progress except on ILK and GM45.
After investigating a couple of the affected shaders, I noticed that
the code in those shaders looked wrong... investigation led to this
cause.

v2: Trivial changes in the unit tests.

v3: Fix type in comment in unit tests.  Noticed by Jason and Priit.

v4: Tweak handling of BRW_OPCODE_SEL special case.  Suggested by Jason.

Fixes: df1aec763e ("i965/fs: Define methods to calculate the flag subset read or written by an fs_inst.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Dave Airlie <airlied@redhat.com>

Iron Lake
total instructions in shared programs: 8180493 -> 8181781 (0.02%)
instructions in affected programs: 541796 -> 543084 (0.24%)
helped: 28
HURT: 1158
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.35% max: 0.86% x̄: 0.53% x̃: 0.50%
HURT stats (abs)   min: 1 max: 3 x̄: 1.14 x̃: 1
HURT stats (rel)   min: 0.12% max: 4.00% x̄: 0.37% x̃: 0.23%
95% mean confidence interval for instructions value: 1.06 1.11
95% mean confidence interval for instructions %-change: 0.31% 0.38%
Instructions are HURT.

total cycles in shared programs: 239420470 -> 239421690 (<.01%)
cycles in affected programs: 2925992 -> 2927212 (0.04%)
helped: 49
HURT: 157
helped stats (abs) min: 2 max: 284 x̄: 62.69 x̃: 70
helped stats (rel) min: 0.04% max: 6.20% x̄: 1.68% x̃: 1.96%
HURT stats (abs)   min: 2 max: 48 x̄: 27.34 x̃: 24
HURT stats (rel)   min: 0.02% max: 2.91% x̄: 0.31% x̃: 0.20%
95% mean confidence interval for cycles value: -0.80 12.64
95% mean confidence interval for cycles %-change: -0.31% <.01%
Inconclusive result (value mean confidence interval includes 0).

GM45
total instructions in shared programs: 4985517 -> 4986207 (0.01%)
instructions in affected programs: 306935 -> 307625 (0.22%)
helped: 14
HURT: 625
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.35% max: 0.82% x̄: 0.52% x̃: 0.49%
HURT stats (abs)   min: 1 max: 3 x̄: 1.13 x̃: 1
HURT stats (rel)   min: 0.12% max: 3.90% x̄: 0.34% x̃: 0.22%
95% mean confidence interval for instructions value: 1.04 1.12
95% mean confidence interval for instructions %-change: 0.29% 0.36%
Instructions are HURT.

total cycles in shared programs: 153827268 -> 153828052 (<.01%)
cycles in affected programs: 1669290 -> 1670074 (0.05%)
helped: 24
HURT: 84
helped stats (abs) min: 2 max: 232 x̄: 64.33 x̃: 67
helped stats (rel) min: 0.04% max: 4.62% x̄: 1.60% x̃: 1.94%
HURT stats (abs)   min: 2 max: 48 x̄: 27.71 x̃: 24
HURT stats (rel)   min: 0.02% max: 2.66% x̄: 0.34% x̃: 0.14%
95% mean confidence interval for cycles value: -1.94 16.46
95% mean confidence interval for cycles %-change: -0.29% 0.11%
Inconclusive result (value mean confidence interval includes 0).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12191>
2021-08-11 13:09:20 -07:00
Sagar Ghuge
705285b9f4 intel/compiler: Add support for ternary add instruction on XeHP
v2:
- Re-arragne opcode in correct order (Matt Turner)
- Move ADD3 case closer to LRP (Jason)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11596>
2021-07-16 15:59:56 +00:00
Lionel Landwerlin
91dcbf1f56 intel/compiler: Track latency/perf of LSC fences
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11759>
2021-07-12 11:39:03 +00:00
Sagar Ghuge
621cf9b1df intel/fs: Lower Byte scattered r/w messages to LSC when available
v2 (Jason Ekstrand):
 - Squash in brw_scheduler changes
 - Update brw_ir_performance

Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
2021-06-30 16:17:18 +00:00
Sagar Ghuge
8f82c8aa1a intel/fs: Lower untyped float atomic messages to LSC when available
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
2021-06-30 16:17:18 +00:00
Mark Janes
bd40a1e8c9 intel/fs: Lower untyped atomic messages to LSC when available
Bspec programming note metions that "Atomic messages are always forced
to "un-cacheable" in the L1 cache". We can make the L1 cache
un-cacheable and L3 with write-back policy.

v2: (Sagar Ghuge):
 - Fix caching policy for atomic messages
 - Fix simd exec size

v3: (Sagar Ghuge):
 - Add atomic messages to brw_schedule_instructions

v4: (Jason Ekstrand):
 - Rebase on lsc_msg_desc reworks

Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com>
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
2021-06-30 16:17:18 +00:00
Mark Janes
4f86a70599 intel/fs: Lower DW untyped r/w messages to LSC when available
This puts the basic infrastructure in place for lowering logical
dataport messages to LSC messages.  We start with the two most obvious
opcodes and add more in later patches.

v2 (Sagar Ghuge):
 - Pass required params to message desc
 - Remove duplicate mlen calculation
 - Change commit message.

v3 (Jason Ekstrand):
 - Drop TGM support

Co-authored-by: Jason Ekstrand <mark.a.janes@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
2021-06-30 16:17:18 +00:00
Jason Ekstrand
8d3468ad5b intel/compiler: Add LSC to messages brw_ir_performance
This adds framework only.  No opcodes.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
2021-06-30 16:17:18 +00:00
Sagar Ghuge
b67f1ff465 intel/compiler: Add support for LSC fence operations
v2 (Jason Ekstrand):
 - Squash SLM and global fence ops together

v3 (Jason Ekstrand):
 - Rework to use message descriptors instead of instruction fields

v4 (Jason Ekstrand):
 - Don't pass BTI into back-end emit function.  Always use FLAT.

Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
2021-06-30 16:17:18 +00:00
Jason Ekstrand
89fd196f6b intel/vec4: Add support for masking pushed data
This is the vec4 equivalent of d0d039a4d3, required for proper UBO
pushing in vertex stages for Vulkan on HSW.  Sadly, the implementation
requires us to do everything in ALIGN1 mode and the vec4 instruction
scheduler doesn't understand HW_GRF <-> UNIFORM interference so it's
easier to do the whole thing in the generator.  We add an instruction
to the top of the program which just means "emit the blob" and all the
magic happens in codegen.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10571>
2021-05-19 14:38:13 +00:00