fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 22:20:14 +01:00

Author	SHA1	Message	Date
Ian Romanick	ef3dc401da	brw: Add devinfo parameter to fs_inst::regs_read This isn't used now, but future commits will add uses. Doing this as a separate commit removes a lot of "just typing" churn from commits that have real changes to review. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:58 -08:00
Francisco Jerez	e2eba3c7da	intel/brw/xe2+: Adjust performance analysis divergence weight due to EU fusion removal. This reduces the penalty the heuristic gives to SIMD32 shaders relative to SIMD16 in presence of discard control flow on Xe2+. The penalty was meant to account for the inefficient divergence behavior of SIMD32 shaders on Gfx12.x platforms, since Gfx12 hardware had EUs bundled in groups of two, and each pair shared control flow logic so both EUs could only execute instructions in lockstep, which meant that SIMD32 shaders had an effective warp size of 64 on Gfx12.x. This change switches back to more optimistic modelling of discard divergence. With it we gain about 6% performance in a Shadow of the Tomb Raider trace (tested on BMG). One may wonder if there are still workloads that would suffer materially from enabling SIMD32 for all pixel shaders on Xe2 instead of using this heuristic, since Xe2 EUs have twice the GRF space, twice the FPU throughput and better divergence behavior than Xe, but the answer seems to be yes unfortunately: E.g. Superposition has some pixel shaders where SIMD32 has substantially worse scheduling due to the increased number of false dependencies due to higher register pressure, and using SIMD32 for them reduces performance significantly. The heuristic seems to model this correctly so it doesn't look like we can do without it at least right now on Xe2. Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31697>	2024-10-24 22:06:52 +00:00
Caio Oliveira	3670c24740	intel/brw: Replace uses of fs_reg with brw_reg And remove the fs_reg alias. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:19 +00:00
Kenneth Graunke	f04bb49465	intel/brw: Delete SAD2 and SADA2 opcodes These were removed with Icelake. While they technically still exist on Skylake, which this compiler supports, we have never used these opcodes in the 14 years we could have done so. So just scrap them. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29665>	2024-06-10 16:47:50 -07:00
Lionel Landwerlin	724bb7fa15	brw: better model READ_ARF_REG opcode This opcode gets translated to 2 ALU instructions with dependency ALU stall. This change reproduces the FS_OPCODE_PACK_HALF_2x16_SPLIT values which is another opcode that generates 2 instructions. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>	2024-05-31 20:22:27 +00:00
Lionel Landwerlin	d8b78924c5	brw: use a single virtual opcode to read ARF registers In `2c65d90bc8` I forgot to add the new SHADER_OPCODE_READ_MASK_REG opcode to the list of barrier instruction in the scheduler. Let's just use a single opcode for all ARF registers that need special scoreboarding and put the register as source (nicer for the debug output). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `2c65d90bc8` ("intel/brw: ensure find_live_channel don't access arch register without sync") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>	2024-05-31 20:22:27 +00:00
Lionel Landwerlin	2c65d90bc8	intel/brw: ensure find_live_channel don't access arch register without sync Another architecture register that requires some care before reading. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `49ee3ae9e8` ("intel/compiler: Lower FIND_[LAST_]LIVE_CHANNEL in IR on Gfx8+") Tested-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29319>	2024-05-24 07:26:17 +00:00
Kenneth Graunke	545bb8fb6f	intel/brw: Replace type_sz and brw_reg_type_to_size with brw_type_size_* Both of these helpers do the same thing. We now have brw_type_size_bits and brw_type_size_bytes and can use whichever makes sense in that place. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Kenneth Graunke	007d891239	intel/brw: Use newer brw_type_is_* shorter names Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Kenneth Graunke	873fcdff38	intel/brw: Stop using long BRW_REGISTER_TYPE enum names s/BRW_REGISTER_TYPE/BRW_TYPE/g Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Kenneth Graunke	d5b8cec7a2	intel/brw: Replace FS_OPCODE_LINTERP with BRW_OPCODE_PLN We no longer support the old LINE+MAC lowering, and we already lower this to MAD in NIR on Gfx11+, so the LINTERP virtual opcode always corresponds the PLN. The only catch is that LINTERP's operands are reversed from PLN, so we have to switch them. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>	2024-04-16 02:14:49 +00:00
Kenneth Graunke	12b0e03bd2	intel/brw: Use SHADER_OPCODE_SEND for coherent framebuffer reads We already have a logical opcode and lower to what is basically a send instruction. We just weren't using SHADER_OPCODE_SEND, instead having extra redundant infrastructure for no real gain. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>	2024-04-16 02:14:49 +00:00
Kenneth Graunke	97bf3d3b2d	intel/brw: Replace CS_OPCODE_CS_TERMINATE with SHADER_OPCODE_SEND There's no need for special handling here, it's just a send message with a trivial g0 header and descriptor. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27924>	2024-03-05 11:16:20 +00:00
Kenneth Graunke	ad37622a8f	intel/brw: Delete legacy texture opcodes We first generate the logical opcodes, and these days fully lower to SHADER_OPCODE_SEND. In the past, we lowered to a non-logical variant and handled that in the generator. These days, we were just using the non-logical opcodes as an awkward intermediate opcode change during the lowering...which isn't really necessary at all. This patch eliminates them by using the original logical opcodes. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27908>	2024-03-01 22:19:51 +00:00
Kenneth Graunke	45a5e4c0c4	intel/brw: Delete SHADER_OPCODE_TXF_UMS Nothing seems to generate this anymore. I guess we always use CMS. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27908>	2024-03-01 22:19:51 +00:00
Kenneth Graunke	601ef12467	intel/brw: Delete SHADER_OPCODE_TXF_CMS[_LOGICAL] We always use the wide variant (_W) on hardware this compiler supports. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27908>	2024-03-01 22:19:50 +00:00
Caio Oliveira	fb1d871714	intel/brw: Fold backend_reg into fs_reg Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27904>	2024-03-01 17:52:09 +00:00
Caio Oliveira	db322554a7	intel/brw: Use fs_inst explicitly in various passes Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27866>	2024-02-29 20:47:48 -08:00
Caio Oliveira	559d94cd0d	intel/brw: Use fs_visitor instead of backend_shader in various passes And since we are touching them, rename a couple of passes to follow same name convention as existing ones. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>	2024-02-29 19:28:05 +00:00
Caio Oliveira	8f3c52c1da	intel/brw: Remove MRF type Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	5c93a0e125	intel/brw: Remove Gfx8- remaining opcodes Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	071e9f49f1	intel/brw: Remove F16TO32 and F32TO16 opcodes These are done with MOVs and appropriate types in Gfx9+. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:38 +00:00
Caio Oliveira	91c05d990a	intel/brw: Remove Gfx8- code from IR performance analysis Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:38 +00:00
Caio Oliveira	c621f75e7b	intel/brw: Remove now unused vec4-only opcodes Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:38 +00:00
Caio Oliveira	a641aa294e	intel/brw: Remove vec4 backend It still exists as part of ELK for older gfx versions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:37 +00:00
Ian Romanick	8fb37ef985	intel/fs: Add fast path for ballot(true) This doesn't help very much now. A later commit adds a NIR optimization pass, tentatively called nir_opt_uniform_subgroup, that converts many kinds of subgroup operations to things involving bitCount(ballot(true)). This commit makes a huge difference in the results of that later commit. No shader-db changes on any Intel platform. Fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Totals: Instrs: 165558033 -> 165557519 (-0.00%) Cycles: 15156188362 -> 15156178922 (-0.00%); split: -0.00%, +0.00% Totals from 299 (0.05% of 656117) affected shaders: Instrs: 88293 -> 87779 (-0.58%) Cycles: 3709498 -> 3700058 (-0.25%); split: -0.28%, +0.03% v2: Rebase on splitting ELK from BRW. Remove devinfo->ver >= 8 check. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>	2024-02-27 08:37:46 -08:00
Sagar Ghuge	6f0ab5e4d5	intel/compiler: Add texture gather offset LOD/Bias message support v2: (Ian) - Space formatting on conditional statement Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>	2024-02-27 00:22:46 +00:00
Sagar Ghuge	79af0ac29a	intel/compiler: Add gather4_i/l/[_c]/b sampler message v2: (Ian) - Format comment Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>	2024-02-27 00:22:46 +00:00
Francisco Jerez	6efcba9e36	intel/ir/xe2+: Add support for 32 SBID tokens to performance model. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27165>	2024-01-20 19:55:31 +00:00
Ian Romanick	e666872c75	intel/compiler: Initial bits for DPAS instruction v2: Add brw_ir_performance.cpp and brw_fs_generator.cpp changes. Fix overlapping register allocation (via has_source_and_destination_hazard). Fix incorrect destination register file encoding. v3: Prevent lower_regioning from trying to "fix" DPAS sources. v4: Add instruction latency information for scheduling and perf estimates. v5: Remove all mention of DPASW. Suggested by Curro and Caio. Update the comment in fs_inst::has_source_and_destination_hazard. Suggested by Caio. v6: Add some comments near the src2 calculation in fs_inst::size_read. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:16 -08:00
Francisco Jerez	23e14a6c27	intel/eu/xe2+: Add definition for size of GRF space on Xe2. And use it in various places in the compiler that require knowledge about the size of the register file. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>	2023-11-08 23:17:24 -08:00
Francisco Jerez	421d43fe62	intel/fs/xe2+: Fixes for increased accumulator register width. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:36 -07:00
Jason Ekstrand	714a291673	intel/compiler: Use SHADER_OPCODE_SEND for PI messages Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>	2023-02-06 09:12:17 +00:00
Lionel Landwerlin	13cca48920	intel/fs: drop FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GFX7 We can lower FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD into other more generic sends and drop this internal opcode. The idea behind this change is to allow bindless surfaces to be used for UBO pulls and why it's interesting to be able to reuse setup_surface_descriptors(). But that will come in a later change. No shader-db changes on TGL & DG2. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20416>	2023-01-26 11:26:53 +00:00
Ian Romanick	bbcb881f46	intel/fs: Remove non-_LOGICAL URB messages The _LOGICAL versions are lowered direct to SEND, so nothing can ever generate these messages. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>	2022-07-08 19:45:34 +00:00
Ian Romanick	bdc7668008	intel/fs: Lower URB messages to SEND Before rebasing on top of Ken's split-SEND optimization (see !17018), this commit just caused some scheduling changes in various tessellation and geometry shaders. These changes were caused by the addition of real latency information for the URB messages. With the addition of the split-SEND optimization, the changes are... staggering. All of the shaders helped for spills and fills are vertex shaders from Batman Arkham Origins. What surprises me is that these shaders account for such a high percentage of the spills and fills in fossil-db. 85%?!? v2: Use FIXED_GRF instead of BRW_GENERAL_REGISTER_FILE in an assertion. Suggested by Ken. Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20013625 -> 19954020 (-0.30%) instructions in affected programs: 4007157 -> 3947552 (-1.49%) helped: 31161 HURT: 0 helped stats (abs) min: 1 max: 400 x̄: 1.91 x̃: 2 helped stats (rel) min: 0.08% max: 59.70% x̄: 2.20% x̃: 1.83% 95% mean confidence interval for instructions value: -1.97 -1.86 95% mean confidence interval for instructions %-change: -2.22% -2.18% Instructions are helped. total cycles in shared programs: 859337569 -> 858636788 (-0.08%) cycles in affected programs: 74168298 -> 73467517 (-0.94%) helped: 13812 HURT: 16846 helped stats (abs) min: 1 max: 291078 x̄: 82.83 x̃: 4 helped stats (rel) min: <.01% max: 37.09% x̄: 3.47% x̃: 2.02% HURT stats (abs) min: 1 max: 1543 x̄: 26.31 x̃: 14 HURT stats (rel) min: <.01% max: 77.97% x̄: 4.11% x̃: 2.58% 95% mean confidence interval for cycles value: -55.10 9.39 95% mean confidence interval for cycles %-change: 0.62% 0.77% Inconclusive result (value mean confidence interval includes 0). Broadwell total cycles in shared programs: 904844939 -> 904832320 (<.01%) cycles in affected programs: 525360 -> 512741 (-2.40%) helped: 215 HURT: 4 helped stats (abs) min: 4 max: 1018 x̄: 60.16 x̃: 39 helped stats (rel) min: 0.14% max: 15.85% x̄: 2.16% x̃: 2.04% HURT stats (abs) min: 79 max: 79 x̄: 79.00 x̃: 79 HURT stats (rel) min: 1.31% max: 1.57% x̄: 1.43% x̃: 1.43% 95% mean confidence interval for cycles value: -75.02 -40.22 95% mean confidence interval for cycles %-change: -2.37% -1.81% Cycles are helped. No shader-db changes on any older Intel platforms. Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown) Instructions in all programs: 142622800 -> 141461114 (-0.8%) Instructions helped: 197186 Cycles in all programs: 9101223846 -> 9099440025 (-0.0%) Cycles helped: 37963 Cycles hurt: 151233 Spills in all programs: 98829 -> 13695 (-86.1%) Spills helped: 2159 Fills in all programs: 128142 -> 18400 (-85.6%) Fills helped: 2159 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>	2022-07-08 19:45:34 +00:00
Ian Romanick	b909ac350f	intel/compiler: Rename vec4 state URB opcodes to have VEC4_ prefix An argument could be made that all stage-specific opcodes for vec4 stages should be prefixed with VEC4_ like the stage-agnostic opcodes. I'll leave those additional sed jobs for another day. egrep -lr '(VS\|GS\|TCS)_OPCODE_URB_WRITE' src \|\ while read f; do sed --in-place 's/$VS\\|GS\\|TCS$_OPCODE_URB_WRITE/VEC4_\1_OPCODE_URB_WRITE/g' $f done egrep -lr 'T.S_OPCODE[_A-Z]URB_OFFSETS' src \|\ while read f; do sed --in-place 's/$T.S_OPCODE[_A-Z]URB_OFFSETS$/VEC4_\1/g' $f done Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>	2022-07-08 19:45:34 +00:00
Kenneth Graunke	72e9843991	intel/compiler: Introduce a new brw_isa_info structure This structure will contain the opcode mapping tables in the next commit. For now, this is the mechanical change to plumb it into all the necessary places, and it continues simply holding devinfo. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>	2022-06-30 23:46:35 +00:00
Kenneth Graunke	6fa66ac228	intel/compiler: Implement nir_intrinsic_last_invocation We haven't exposed this intrinsic as it doesn't directly correspond to anything in SPIR-V. However, it's used internally by some NIR passes, namely nir_opt_uniform_atomics(). We reuse most of the infrastructure in brw_find_live_channel, but with LZD/ADD instead of FBL. A new SHADER_OPCODE_FIND_LAST_LIVE_CHANNEL is like SHADER_OPCODE_FIND_LIVE_CHANNEL but from the other side. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>	2022-03-26 00:28:19 +00:00
Lionel Landwerlin	3dabe93257	intel/fs: rework dss_id opcode into generic opcode We'll want different types of IDs based on topology. Let's make this more flexible and also move the bit shifting code a layer above where it's easier to do bitshifting operations, especially if you need to stash things into temporary registers. v2: Keep previous comment. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>	2022-02-08 12:55:24 +00:00
Kenneth Graunke	7325179bcb	intel/compiler: Use uppercase enum values in brw_ir_performance.cpp This is by far the more common style in Mesa. It also gives a cue that e.g. num_dependency_ids is a fixed definition rather than some kind of local variable maintaining a count. While hre, we also rename the enums to have full prefixes to prepare for a future where we use them in multiple files for future backend work. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14182>	2021-12-16 09:00:57 +00:00
Lionel Landwerlin	361b3fee3c	intel: move away from booleans to identify platforms v2: Drop changes around GFX_VERx10 == 75 (Luis) v3: Replace (GFX_VERx10 < 75 && devinfo->platform != INTEL_PLATFORM_BYT) by (devinfo->platform == INTEL_PLATFORM_IVB) Replace (devinfo->ver >= 5 \|\| devinfo->platform == INTEL_PLATFORM_G4X) by (devinfo->verx10 >= 45) Replace (devinfo->platform != INTEL_PLATFORM_G4X) by (devinfo->verx10 != 45) v4: Fix crocus typo v5: Rebase v6: Add GFX3, ILK & I965 platforms (Jordan) Move ifdef to code expressions (Jordan) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12981>	2021-11-08 16:48:06 +00:00
Jason Ekstrand	e6a9501aa2	intel/fs: Add the URB fence message When they re-arranged all the dataport stuff and added the LSC, doing URB fencing through the dataport no longer makes sense. Instead, there is now a fence message on the URB shared function. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Tested-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13092>	2021-09-29 20:52:54 +00:00
Ian Romanick	0f809dbf40	intel/compiler: Basic support for DP4A instruction v2: Very significant rebase on changes to previous commits. Specifically, brw_fs_nir.cpp changes were pretty much rewritten from scratch after changing the NIR opcode names and types. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>	2021-08-24 19:58:57 +00:00
Dave Airlie	8a81d14271	intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5 This is the equivalent of idr's intel/fs: sel.cond writes the flags on Gfx4 and Gfx5 except for the vec4 backend. This fixes buggy rendering seen with crocus on a qt trace. v2 (idr): Trivial whitespace change. Add unit tests. v3: Fix type in comment in unit tests. Noticed by Jason and Priit. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Iron Lake total instructions in shared programs: 8183077 -> 8184543 (0.02%) instructions in affected programs: 198990 -> 200456 (0.74%) helped: 0 HURT: 1355 HURT stats (abs) min: 1 max: 8 x̄: 1.08 x̃: 1 HURT stats (rel) min: 0.29% max: 6.00% x̄: 0.99% x̃: 0.70% 95% mean confidence interval for instructions value: 1.04 1.12 95% mean confidence interval for instructions %-change: 0.96% 1.03% Instructions are HURT. total cycles in shared programs: 238967672 -> 238962784 (<.01%) cycles in affected programs: 4666014 -> 4661126 (-0.10%) helped: 406 HURT: 314 helped stats (abs) min: 4 max: 54 x̄: 22.46 x̃: 18 helped stats (rel) min: <.01% max: 12.80% x̄: 1.82% x̃: 0.65% HURT stats (abs) min: 2 max: 112 x̄: 13.48 x̃: 12 HURT stats (rel) min: <.01% max: 7.82% x̄: 0.81% x̃: 0.16% 95% mean confidence interval for cycles value: -8.60 -4.98 95% mean confidence interval for cycles %-change: -0.87% -0.49% Cycles are helped. GM45 total instructions in shared programs: 4986888 -> 4988354 (0.03%) instructions in affected programs: 198990 -> 200456 (0.74%) helped: 0 HURT: 1355 HURT stats (abs) min: 1 max: 8 x̄: 1.08 x̃: 1 HURT stats (rel) min: 0.29% max: 6.00% x̄: 0.99% x̃: 0.70% 95% mean confidence interval for instructions value: 1.04 1.12 95% mean confidence interval for instructions %-change: 0.96% 1.03% Instructions are HURT. total cycles in shared programs: 153577826 -> 153572938 (<.01%) cycles in affected programs: 4666014 -> 4661126 (-0.10%) helped: 406 HURT: 314 helped stats (abs) min: 4 max: 54 x̄: 22.46 x̃: 18 helped stats (rel) min: <.01% max: 12.80% x̄: 1.82% x̃: 0.65% HURT stats (abs) min: 2 max: 112 x̄: 13.48 x̃: 12 HURT stats (rel) min: <.01% max: 7.82% x̄: 0.81% x̃: 0.16% 95% mean confidence interval for cycles value: -8.60 -4.98 95% mean confidence interval for cycles %-change: -0.87% -0.49% Cycles are helped. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12191>	2021-08-11 13:09:32 -07:00
Ian Romanick	38807ceeae	intel/fs: sel.cond writes the flags on Gfx4 and Gfx5 On Gfx4 and Gfx5, sel.l (for min) and sel.ge (for max) are implemented using a separte cmpn and sel instruction. This lowering occurs in fs_vistor::lower_minmax which is called very, very late... a long, long time after the first calls to opt_cmod_propagation. As a result, conditional modifiers can be incorrectly propagated across sel.cond on those platforms. No tests were affected by this change, and I find that quite shocking. After just changing flags_written(), all of the atan tests started failing on ILK. That required the change in cmod_propagatin (and the addition of the prop_across_into_sel_gfx5 unit test). Shader-db results for ILK and GM45 are below. I looked at a couple before and after shaders... and every case that I looked at had experienced incorrect cmod propagation. This affected a LOT of apps! Euro Truck Simulator 2, The Talos Principle, Serious Sam 3, Sanctum 2, Gang Beasts, and on and on... :( I discovered this bug while working on a couple new optimization passes. One of the passes attempts to remove condition modifiers that are never used. The pass made no progress except on ILK and GM45. After investigating a couple of the affected shaders, I noticed that the code in those shaders looked wrong... investigation led to this cause. v2: Trivial changes in the unit tests. v3: Fix type in comment in unit tests. Noticed by Jason and Priit. v4: Tweak handling of BRW_OPCODE_SEL special case. Suggested by Jason. Fixes: `df1aec763e` ("i965/fs: Define methods to calculate the flag subset read or written by an fs_inst.") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Dave Airlie <airlied@redhat.com> Iron Lake total instructions in shared programs: 8180493 -> 8181781 (0.02%) instructions in affected programs: 541796 -> 543084 (0.24%) helped: 28 HURT: 1158 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.35% max: 0.86% x̄: 0.53% x̃: 0.50% HURT stats (abs) min: 1 max: 3 x̄: 1.14 x̃: 1 HURT stats (rel) min: 0.12% max: 4.00% x̄: 0.37% x̃: 0.23% 95% mean confidence interval for instructions value: 1.06 1.11 95% mean confidence interval for instructions %-change: 0.31% 0.38% Instructions are HURT. total cycles in shared programs: 239420470 -> 239421690 (<.01%) cycles in affected programs: 2925992 -> 2927212 (0.04%) helped: 49 HURT: 157 helped stats (abs) min: 2 max: 284 x̄: 62.69 x̃: 70 helped stats (rel) min: 0.04% max: 6.20% x̄: 1.68% x̃: 1.96% HURT stats (abs) min: 2 max: 48 x̄: 27.34 x̃: 24 HURT stats (rel) min: 0.02% max: 2.91% x̄: 0.31% x̃: 0.20% 95% mean confidence interval for cycles value: -0.80 12.64 95% mean confidence interval for cycles %-change: -0.31% <.01% Inconclusive result (value mean confidence interval includes 0). GM45 total instructions in shared programs: 4985517 -> 4986207 (0.01%) instructions in affected programs: 306935 -> 307625 (0.22%) helped: 14 HURT: 625 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.35% max: 0.82% x̄: 0.52% x̃: 0.49% HURT stats (abs) min: 1 max: 3 x̄: 1.13 x̃: 1 HURT stats (rel) min: 0.12% max: 3.90% x̄: 0.34% x̃: 0.22% 95% mean confidence interval for instructions value: 1.04 1.12 95% mean confidence interval for instructions %-change: 0.29% 0.36% Instructions are HURT. total cycles in shared programs: 153827268 -> 153828052 (<.01%) cycles in affected programs: 1669290 -> 1670074 (0.05%) helped: 24 HURT: 84 helped stats (abs) min: 2 max: 232 x̄: 64.33 x̃: 67 helped stats (rel) min: 0.04% max: 4.62% x̄: 1.60% x̃: 1.94% HURT stats (abs) min: 2 max: 48 x̄: 27.71 x̃: 24 HURT stats (rel) min: 0.02% max: 2.66% x̄: 0.34% x̃: 0.14% 95% mean confidence interval for cycles value: -1.94 16.46 95% mean confidence interval for cycles %-change: -0.29% 0.11% Inconclusive result (value mean confidence interval includes 0). Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12191>	2021-08-11 13:09:20 -07:00
Sagar Ghuge	705285b9f4	intel/compiler: Add support for ternary add instruction on XeHP v2: - Re-arragne opcode in correct order (Matt Turner) - Move ADD3 case closer to LRP (Jason) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11596>	2021-07-16 15:59:56 +00:00
Lionel Landwerlin	91dcbf1f56	intel/compiler: Track latency/perf of LSC fences Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11759>	2021-07-12 11:39:03 +00:00
Sagar Ghuge	621cf9b1df	intel/fs: Lower Byte scattered r/w messages to LSC when available v2 (Jason Ekstrand): - Squash in brw_scheduler changes - Update brw_ir_performance Co-authored-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>	2021-06-30 16:17:18 +00:00
Sagar Ghuge	8f82c8aa1a	intel/fs: Lower untyped float atomic messages to LSC when available Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>	2021-06-30 16:17:18 +00:00

1 2

73 commits