fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 00:30:13 +01:00

Author	SHA1	Message	Date
Ian Romanick	7873edee6e	intel/fs: Use specialized version of regions_overlap in opt_copy_propagation Since one of the register must always be either VGRF or FIXED_GRF, much of regions_overlap and reg_offset can be elided. On my Ice Lake laptop (using a locked CPU speed and other measures to prevent thermal throttling, etc.) using a debugoptimized build, improves performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.spir'" by -0.29% ± 0.097% (n = 5, pooled s = 0.361697). Using a release build, improves performance of compiling shaders from batman_arkham_city_goty.foz by -3.3% ± 0.04% (n = 5, pooled s = 0.178312). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>	2023-04-06 19:07:50 +00:00
Ian Romanick	43cb42df7c	intel/compiler: Micro optimize inst_is_in_block This function only exists in builds with assertions, so it only matters there. On my Ice Lake laptop (using a locked CPU speed and other measures to prevent thermal throttling, etc.) using a debugoptimized build, improves performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.spir'" by -5.2% ± 0.16% (n = 5, pooled s = 0.657887). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>	2023-04-06 19:07:50 +00:00
Ian Romanick	d47f521ee4	intel/compiler: Use NIR_PASS instead of NIR_PASS_V Reduce debug log spam by only logging the shader if a pass made some changes. This can also elide some nir_validate calls in debug builds. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>	2023-04-06 19:07:50 +00:00
Ian Romanick	fb950a9edf	intel/compiler: Remove one overload of backend_instruction::insert_before The version that takes a list of instructions is not used. I did not do any archaeology to find out when the last user was removed. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>	2023-04-06 19:07:50 +00:00
Emma Anholt	f1ea6c1b40	intel: Always call nir_lower_frexp. We have NIR lowering for Vulkan, and rely on GLSL's lowering in the frontend, but this will let us drop the GLSL lowering. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22083>	2023-04-06 02:32:01 +00:00
Jordan Justen	eef7a117a1	intel/compiler: Support fmul_fsign opt for fp64 when int64 isn't supported MTL support fp64, but not int64. The fsign(double(x))*FOO optimization would try to use a 64-bit int xor operation to conditionally toggle the sign bit off the result. Since this only affects high bit of the result, we can do a 32-bit move of the low dword, and a 32-bit xor on the high dword. Fixes dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp64.input_args.modf_denorm_flush_to_zero on MTL. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22259>	2023-04-05 18:48:21 +00:00
Lionel Landwerlin	e25aee8e34	intel/fs: also allow vec8+ vectorization of load_global_const_block_intel Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>	2023-04-05 12:32:56 +00:00
Lionel Landwerlin	a358b97c58	intel/fs: optimize uniform SSBO & shared loads Using divergence analysis, figure out when SSBO & shared memory loads are uniform and carry the data only once in register space. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>	2023-04-05 12:32:56 +00:00
Lionel Landwerlin	275ad509c1	intel/fs: factor out lsc surface descriptor settings Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>	2023-04-05 12:32:56 +00:00
Lionel Landwerlin	76698f3abd	intel/fs: copy instruction sources in logical send lowering Having references to inst->src[X] when you're also modifying inst->src[X] is a receipe for disaster. Making changes to the lowering code I've been bitten quite a few times by this take copies of all sources to do the lowering. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>	2023-04-05 12:32:56 +00:00
Lionel Landwerlin	adb8c30436	intel/fs: UNDEF fixup_nomask_control_flow temp register Ensure that the register's liveness is not expanded to loops. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>	2023-04-05 12:32:56 +00:00
Lionel Landwerlin	362a07db3a	intel/fs: don't consider fixup_nomask_control_flow SENDs predicate Those SENDs are still doing a full register write. We just inserted some predication for a workaround. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>	2023-04-05 12:32:56 +00:00
Lionel Landwerlin	34d8bfe65f	intel/fs: run VGRF compaction just before max live register accounting There are a number of instances of the dead code elimination pass that could reduce the count. For some reason this also seems to affect register allocation itself. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>	2023-04-05 12:32:56 +00:00
Ian Romanick	2016d9f46c	intel/fs: Rework the loop of opt_combine_constants that collects constants This is a bit more wordy, but it will greatly simplify some future changes. v2: Rebase on ADD3 changes. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22274>	2023-04-03 21:50:06 +00:00
Ian Romanick	9e4bb4bfcf	intel/fs: Refactor part of opt_combine_constants to a separate function Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22274>	2023-04-03 21:50:06 +00:00
Ian Romanick	593cde0432	intel/fs: Output opt_combine_constants debug to stderr It's a lot more useful to have it in the same stream with the INTEL_DEBUG=fs output. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22274>	2023-04-03 21:50:06 +00:00
Patrick Lerda	5d85966805	intel: fix memory leak related to brw_nir_create_passthrough_tcs() Indeed, the parameter "mem_ctx" was not processed. For instance, this issue is triggered with the crocus driver and "piglit/bin/shader_runner tests/spec/arb_tessellation_shader/execution/compatibility/tes-clip-vertex-different-from-position.shader_test -auto -fbo": SUMMARY: AddressSanitizer: 235216 byte(s) leaked in 48 allocation(s). Fixes: `96ba0344db` ("intel: Use common helpers for TCS passthrough shaders") Signed-off-by: Patrick Lerda <patrick9876@free.fr> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22173>	2023-03-30 10:52:07 +00:00
Ian Romanick	782de1932c	intel/fs: Don't copy propagate from saturate to sel There are already NIR algebraic optimizations (see also `ac6646129f` ("nir: Move fsat outside of fmin/fmax if second arg is 0 to 1.") that will try to remove the saturate from things like fmax(0.5, fsat(x)) This basically reverts `40aeb558ce` ("i965/fs: Allow propagation of instructions with saturate flag to sel"). That commit message had no shader-db information, so it's unclear whether this actually helped anything ever. No shader-db changes on any Intel platform. One shader in Far Cry New Dawn was affected. Cycles in all programs: 10933090738 -> 10933090736 (-0.0%) Cycles helped: 1 Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22169>	2023-03-29 23:48:19 +00:00
Marcin Ślusarz	32107d8b5a	intel/compiler: compactify locations of mesh outputs Needed in support of anv code for Wa_14015590813. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17622>	2023-03-29 18:35:55 +00:00
Faith Ekstrand	789992b7c9	intel: Drop some author comments and update Faith's name Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22120>	2023-03-26 00:16:25 +00:00
Sagar Ghuge	cece2aa2c1	intel/compiler: Add Wa_14014063774 for slm_fence Before SLM fence compiler needs to insert SYNC.ALLWR in order to avoid the SLM data race. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22050>	2023-03-25 00:45:04 +00:00
Mark Janes	33d03e57ad	intel/fs: use generated helpers for Wa_14013363432 / Wa_14012688258 Wa_14013363432 is a clone of Wa_14012688258. It does not apply to all gfx 12.5 platforms. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21745>	2023-03-23 19:13:09 +00:00
Tapani Pälli	6538c5bcd4	intel/fs: restore message layout changes for cube array This reverts commit `bc04e2daca` that handled the change as a WA while this is about a new feature, change done in message layout. Patch also changes the original comment to not refer to Wa but bspec page. Fixes: `bc04e2daca` ("intel/fs: use generated helpers for Wa_1209978020 / Wa_18012201914") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <markjanes@swizzler.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22068>	2023-03-22 20:18:11 +00:00
Lionel Landwerlin	2acc2f18ea	intel/compiler: report max dispatch width statistic Most tools looking at shader stats assume that there is only a single resulting binary shader out of a single input. On Intel HW this is not always the case. So having a statistic on each variant that reports the maximum dispatch width helps showing improvement on a single shader in terms of how large we manage to compile it. For shaders that can be compiled in multiple SIMD width (like fragment shaders), this will report the maximum dispatch width in the statistics of each variants. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22014>	2023-03-21 11:53:04 +00:00
Iván Briano	4dd81b4e2f	intel/fs: handle interpolation modes for at_sample and at_offset too Fixes dEQP-VK.draw..linear_interpolation. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19647>	2023-03-18 10:18:15 +00:00
Francisco Jerez	76b4255cd8	intel/fs: Fix register coalesce in presence of force_writemask_all copy source writes. This fixes the behavior of register coalesce in cases where the source of a copy is written elsewhere in the program by a force_writemask_all instruction, which could cause the overwrite to be executed for an inactive channel under non-uniform control flow, causing can_coalesce_vars() to give incorrect results. This has been reported in cases like: > while (true) { > x = imageSize(img); > if (non_uniform_condition()) { > y = x; > break; > } > } > use(y); Currently the register coalesce pass would coalesce x and y in the example above, which is invalid since in the example above imageSize() is implemented as a force_writemask_all SEND message, whose result is broadcast to all channels, so when a given channel executes 'y = x' and breaks out of the loop, another divergent channel can execute a subsequent iteration of the loop overwriting 'x' with a different value, hence coalescing y and x into the same register changes the behavior of the program. Note that this is a regression introduced by commit `a4b36cd3dd`. In order to avoid the problem without reverting that patch, we prevent register coalesce if there is an overwrite of the source with force_writemask_all behavior inconsistent with the copy and this occurs anywhere in the intersection of the live ranges of source and destination, even if it occurs lexically before the copy, since it might be physically executed after the copy under divergent loop control flow. Fixes: `a4b36cd3dd` ("intel/fs: Coalesce when the src live range is contained in the dst") Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21351>	2023-03-17 03:05:24 -07:00
Francisco Jerez	d4015bcb38	intel/fs: Fix copy propagation dataflow analysis in presence of force_writemask_all ACP overwrites. This fixes the behavior of copy propagation in cases where either the source or destination of an ACP is overwritten elsewhere in the program by a force_writemask_all instruction, which could cause the overwrite to be executed for an inactive channel under non-uniform control flow, causing the current per-channel dataflow propagation to give incorrect results. This has been reported in cases like: > while (true) { > x = imageSize(img); > if (non_uniform_condition()) { > y = x; > break; > } > } > use(y); Currently the copy propagation pass would propagate copy 'y = x' into 'use(y)', which is invalid since in the example above imageSize() is implemented as a force_writemask_all SEND message, whose result is broadcast to all channels, so when a given channel executes 'y = x' and breaks out of the loop, another divergent channel can execute a subsequent iteration of the loop overwriting 'x' with a different value, hence replacing 'y' with 'x' at 'use(y)' changes the behavior of the program. This patch extends the global dataflow analysis algorithm to determine whether there is any control flow path from a given copy to an overwrite of its source or destination which has force_writemask_all behavior inconsistent with the copy, and in such case prevents copy propagation for that ACP entry at any point of the program which can be reached from the overwrite, even if the copy is statically re-executed along all such control flow paths (as in the example above), since the execution of the overwrite for a given channel i may corrupt other channels j!=i inactive for the subsequently re-executed copy. Note that a simpler solution has been attempted which fully shuts down copy propagation if such a force_writemask_all ACP overwrite is present /anywhere/ in the program regardless of its location in the control flow graph, however that led to large shader-db regressions in some programs from shader-db (like a CS from Car Chase which would emit 53% more instructions). With this solution the only handful of shaders that suffer instruction count regressions seem to be getting misoptimized right now (e.g. some compute shaders from Deus Ex Mankind). This solution doesn't seem to affect the run-time of shader-db significantly, it's less than 1% higher with the fix applied. Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21351>	2023-03-17 03:05:20 -07:00
Francisco Jerez	1c1be23497	intel/fs: Track force_writemask_all behavior of copy propagation ACP entries. force_writemask_all determines whether all channels of the copy are actually valid, and may be required to be set for it to be propagated safely in cases where the destination of the copy is used by another force_writemask_all instruction, or when the copy occurs in a divergent control flow block different from its use. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21351>	2023-03-17 03:05:18 -07:00
Kenneth Graunke	14f9f98dcb	i965/vec4: Implement uclz in the vec4 backend Commit `28311f9d02` moved ufind_msb lowering to NIR and started emitting uclz. Unfortunately, the vec4 backend never actually implemented uclz. It's trivial to do. Now it does. Fixes: `28311f9d02` ("nir: intel/compiler: Move ufind_msb lowering to NIR") Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21974>	2023-03-17 09:01:18 +00:00
Kenneth Graunke	e7ea2aa46c	intel/fs: Make bld.F16TO32 actually emit F16TO32 not F32TO16 Ahem, "add builder helpers that work on Gfx7"...now might actually work. Too much copy and paste... Fixes: `966995d911` ("intel/fs: Add builder helpers for F32TO16/F16TO32 that work on Gfx7.x") Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21974>	2023-03-17 09:01:18 +00:00
Kenneth Graunke	84197bc0a4	intel/vec4: Retype texture/sampler indexes to UD generate_tex() asserts that sampler_index.type == UD, but commit `83fd7a5ed1` removed the uint temporary, which caused us to see D at some points. Really, either should be fine, but let's just put the UD retype back. This fixes a ton of things in crocus. Fixes: `83fd7a5ed1` ("intel: Use nir_lower_tex_options::lower_index_to_offset") Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21974>	2023-03-17 09:01:18 +00:00
Lionel Landwerlin	56474fae93	intel/fs: fix subgroup invocation read bounds checking nir->info.subgroup_size can be set to an enum : SUBGROUP_SIZE_VARYING = 0 SUBGROUP_SIZE_UNIFORM = 1 SUBGROUP_SIZE_API_CONSTANT = 2 SUBGROUP_SIZE_FULL_SUBGROUPS = 3 So compute the API subgroup size value and compare it to the dispatch size to determine whether we need some bound checking. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `9ac192d79d` ("intel/fs: bound subgroup invocation read to dispatch size") Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21856>	2023-03-14 12:15:48 +00:00
Lionel Landwerlin	bf59cfcee1	intel/fs: prevent large vector ops generated by peephole_ffma Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21782>	2023-03-14 10:38:50 +00:00
Lionel Landwerlin	bc08f43991	intel/fs: add MOV source count validation Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21782>	2023-03-14 10:38:50 +00:00
Lionel Landwerlin	ed3c2f73db	intel/fs: fixup sources number from opt_algebraic Fixes issues with register_coalesce : fossilize-replay: brw_fs_register_coalesce.cpp:297: bool fs_visitor::register_coalesce(): Assertion `mov[i]->sources == 1' failed. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21782>	2023-03-14 10:38:50 +00:00
Lionel Landwerlin	18bdc71459	intel/fs: fix nir_opt_peephole_ffma max vec assumption There can be larger vec than vec4. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21782>	2023-03-14 10:38:50 +00:00
Lionel Landwerlin	efde1917c9	intel/fs: don't SEND messages as partial writes For instance, to load uniform data with the LSC we usually rely on tranpose messages which have to execute in SIMD1. Those end up being considered as partial writes so within loops their life span spread to the whole loop, increasing register pressure. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21867>	2023-03-14 10:10:32 +00:00
Ian Romanick	28311f9d02	nir: intel/compiler: Move ufind_msb lowering to NIR Fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Cycles in all programs: 9098346105 -> 9098333765 (-0.0%) Cycles helped: 6 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Ian Romanick	08ca862ef8	intel/compiler: Tighter src and dest size bounds checking for some opcodes Enforce the sizes listed in the Skylake PRM: BFREV: source types: D destination types: D CBIT: source types: UB, UW, UD destination types: UD FBH: source types: D, UD destination types: UD FBL: source types: UD destination types: UD LZD: source types: D, UD destination types: UD v2: Update BFREV commit message documentation. Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Ian Romanick	0cc7bf63b7	nir: intel/compiler: Move ifind_msb lowering to NIR Unlike ufind_msb, ifind_msb is only defined in NIR for 32-bit values, so no @32 annotation is required. No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Ian Romanick	15c6c859cf	intel/compiler: Lower find_lsb in NIR No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Eric Engestrom	f5d3d1e7ed	meson: inline gtest_test_protocol now that it's always 'gtest' Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21485>	2023-03-10 07:20:29 +00:00
Sagar Ghuge	9a34b2ab0e	intel/compiler: Add swsb_stall debug option When enabled, on gfx12 plus, we will add the sync nop instruction after each instruction to make sure that current instruction depends on the previous instruction explicitly. This option will help us to get a hint if something is missing or broken in software scoreboard pass. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21797>	2023-03-10 06:55:39 +00:00
Kenneth Graunke	dfe652fb03	intel/eu: Simplify brw_F32TO16 and brw_F16TO32 Now that we aren't using them on Gfx8+ we can drop a lot of cruft. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00
Kenneth Graunke	c590a3eadf	intel/fs: Move packHalf2x16 handling to lower_pack() This mainly lets the software scoreboarding pass correctly mark the instructions, without needing to resort to fragile manual handling in the generator. We can also make small improvements. On Gfx 8LP-12.0, we no longer have the restrictions about DWord alignment, so we can simply write each half into its intended location, rather than writing it to the low DWord and then shifting it in place. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00
Kenneth Graunke	f5e5705c91	intel/fs: Use F32TO16/F16TO32 helpers in fquantize16 handling I originally thought that we were intentionally emitting the legacy opcodes here to make them opaque to the optimizer, so that it wouldn't eliminate the explicit type conversions, as they're actually required to do the quantization. But...we don't actually optimize those away currently anyway. So...go ahead and use the helpers for consistency. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00
Kenneth Graunke	44c6ccb197	Revert "intel/fs: Fix inferred_sync_pipe for F16TO32 opcodes" With the previous patch, we no longer need to special case this, as we emit a MOV with an HF source, rather than F16TO32 with an UW source, on all platforms that need scoreboarding. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00
Kenneth Graunke	309ec3725a	intel/fs: Use new F16TO32 helpers for unpack_half_split_* opcodes This gets us a MOV at the IR level on Gfx8+ which should be more optimizable than F16TO32. It also removes confusion about which pipe which the instruction will run on. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00
Kenneth Graunke	78bf53904e	intel/fs: Delete a TODO about using brw_F32TO16. We can just use the new builder helpers to get the optimization advantages of a MOV on Gfx8+ while also getting the necessary F32TO16 on Gfx7.x and yet not worry too hard about it. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00
Kenneth Graunke	966995d911	intel/fs: Add builder helpers for F32TO16/F16TO32 that work on Gfx7.x These take care of emitting the F32TO16/F16TO32 instructions on Gfx7.x but otherwise just emit a type converting MOV on Gfx8+. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00

1 2 3 4 5 ...

2524 commits