fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-23 11:10:10 +01:00

Author	SHA1	Message	Date
Rohan Garg	3a8f5c2783	intel: update comments about non-existent function parameter Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23098>	2023-05-18 15:46:06 +02:00
Rohan Garg	a15cc833f9	intel: drop unused is_scalar function parameter in brw_nir_apply_key Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23098>	2023-05-18 15:46:06 +02:00
Rohan Garg	212810ac8a	intel: infer scalar'ness locally for brw_postprocess_nir Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23098>	2023-05-18 15:46:06 +02:00
Kenneth Graunke	78a195f252	intel/compiler: Postpone most int64 lowering to brw_postprocess_nir Float conversions continue to be lowered early at the same time as nir_lower_doubles, which we run early so we don't have to run it for every shader key variant. However, all other int64 lowering is now done late, after nir_opt_load_store_vectorize(), allowing it to comprehend basic arithmetic on 64-bit addresses. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23064>	2023-05-18 10:48:50 +00:00
Alyssa Rosenzweig	c7861fe1f2	nir: Drop unused argument from nir_ssa_dest_init_for_type Similar to nir_ssa_dest_init, but with fewer call sites to churn through. This was done with the help of Coccinelle: @@ expression A, B, C, D; @@ -nir_ssa_dest_init_for_type(A, B, C, D); +nir_ssa_dest_init_for_type(A, B, C); Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23078>	2023-05-17 23:46:16 +00:00
Alyssa Rosenzweig	01e9ee79f7	nir: Drop unused name from nir_ssa_dest_init Since `624e799cc3` ("nir: Drop nir_ssa_def::name and nir_register::name"), SSA defs don't have names, making the name argument unused. Drop it from the signature and fix the call sites. This was done with the help of the following Coccinelle semantic patch: @@ expression A, B, C, D, E; @@ -nir_ssa_dest_init(A, B, C, D, E); +nir_ssa_dest_init(A, B, C, D); Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23078>	2023-05-17 23:46:16 +00:00
Alyssa Rosenzweig	c323762f9f	treewide: Stop lowering legacy atomics There are no more producers of legacy atomics so these calls are inert. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23036>	2023-05-16 22:36:21 +00:00
Alyssa Rosenzweig	e7bb53467b	intel: Produce unified atomics Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23036>	2023-05-16 22:36:21 +00:00
Lionel Landwerlin	952a523abb	intel: switch over to unified atomics Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23004>	2023-05-15 16:32:21 +00:00
Konstantin Seurer	0cf22f9af3	nir: Make rq_load committed src an index committed has to be a constant so there is no need to have a src and depend on constant folding to remove the i2b. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22963>	2023-05-14 17:28:40 +00:00
Lionel Landwerlin	b4b17f8aaa	Revert "intel/compiler: make uses_pos_offset a tri-state" This reverts commit `5489033fa8`. The problem I was trying to address is that we were programming the 3DSTATE_PS::PositionXYOffsetSelect bit differently with GPL (CENTROID) than without (NONE). I failed to understand that this bit also impacts the thread payload layout. GPL fragment shaders don't know ahead of time if pos_offset is going to be used. It'll be choosen at runtime base on push constant bits. So we need to program this bit different just to have a payload matching the compiled shader code. This fixes the freedoom replay with GPL FS shader in SIMD32. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22938>	2023-05-11 08:01:46 +00:00
Kenneth Graunke	f00143acc3	intel/compiler: Fold constants after distributing source modifiers This can generate things like fneg! of load_const, which is silly. Fold those away into an actual constant. Only do so on the scalar backend because there's a comment above that the vec4 backend doesn't want any new constants this late, and I'm inclined to believe it. fossil-db stats show a very minor improvement: Totals: Instrs: 203091223 -> 203091099 (-0.00%); split: -0.00%, +0.00% Cycles: 14410638075 -> 14410577067 (-0.00%); split: -0.00%, +0.00% Totals from 20 (0.00% of 665070) affected shaders: Instrs: 27067 -> 26943 (-0.46%); split: -0.47%, +0.01% Cycles: 2687958 -> 2626950 (-2.27%); split: -2.27%, +0.00% Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22881>	2023-05-09 00:16:40 -07:00
Lionel Landwerlin	fb13360546	intel/fs: reduce register usage for relocated constants Commit `bb8e31b7ed` ("anv: avoid hardcoding instruction VA constant in shaders") had a slight negative impact on shaders (Red Dead Redemption 2 in particular). Dropping a few shaders from SIMD32 to SIMD16. With this change, it brings back all the dropped SIMD32 shaders. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22872>	2023-05-07 19:38:04 +00:00
Lionel Landwerlin	9471ffa70a	intel/fs: fix scheduling of HALT instructions With the following test : dEQP-VK.spirv_assembly.instruction.terminate_invocation.terminate.no_out_of_bounds_load There is a : shader_start: ... <- no control flow g0 = some_alu g1 = fbl g2 = broadcast g3, g1 g4 = get_buffer_size g2 ... <- no control flow halt <- on some lanes g5 = send <surface>, g4 eliminate_find_live_channel will remove the fbl/broadcast because it assumes lane0 is active at get_buffer_size : shader_start: ... <- no control flow g0 = some_alu g4 = get_buffer_size g0 ... <- no control flow halt <- on some lanes g5 = send <surface>, g4 But then the instruction scheduler will move the get_buffer_size after the halt : shader_start: ... <- no control flow halt <- on some lanes g0 = some_alu g4 = get_buffer_size g0 g5 = send <surface>, g4 get_buffer_size pulls the surface index from lane0 in g0 which could have been turned off by the halt and we end up accessing an invalid surface handle. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20765>	2023-05-05 00:43:25 +03:00
Kenneth Graunke	9dd6fcd9ec	intel/compiler: UNDEF SubgroupInvocation's register This value takes a few instructions to create, involving expanding V-immediates, adding 8 for SIMD16, and so on. We can mark it UNDEF so that it's clear that although these are partial writes, we are actually defining the entire value. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22835>	2023-05-04 18:17:26 +00:00
Kenneth Graunke	4913f54a1f	intel/compiler: UNDEF comparisons with smaller than 32-bit Comparisons which produce 32-bit boolean results (0 or 0xFFFFFFFF) but operate on 16-bit types would first generate a CMP instruction with W or HF types, before expanding it out. This CMP is a partial write, which leads us to think the register may contain some prior contents still. When placed in a loop, this causes its live range to extend beyond its real life time. Mark the register with UNDEF first so that we know that no prior contents exist and need to be preserved. This affects: flt32, fge32, feq32, fneu32, ilt32, ult32, ige32, uge32, ieq32, ine32 On one of Cyberpunk 2077's most complex compute shaders, this reduces the maximum live registers from 696 to 537 (22.8%). Together with the next patch, Cyberpunk's spills and fills are cut by 10.23% and 9.19%, respectively. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22835>	2023-05-04 18:17:26 +00:00
Lionel Landwerlin	5cdcc22736	intel/nir/rt: wire position fetch intrinsic Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <f{merge_request.web_url}>	2023-05-04 11:25:41 +00:00
Lionel Landwerlin	03f0f70adf	intel/nir/rt: use a single load for instance leaf loading Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <f{merge_request.web_url}>	2023-05-04 11:25:41 +00:00
Lionel Landwerlin	5489033fa8	intel/compiler: make uses_pos_offset a tri-state This value depends on the per-sample value which can be unknown at compile time with graphics pipeline libraries. So we need to have this dynamic has well and pick the right value when generating the 3DSTATE_PS/3DSTATE_WM packet. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `d8dfd153c5` ("intel/fs: Make per-sample and coarse dispatch tri-state") Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22728>	2023-05-03 10:03:57 +00:00
Lionel Landwerlin	7ddc31c672	intel/fs: fix per vertex input clamping Only apply the clamp in multi patch mode (where the input vertices vary between [1, 32]). The clamp NIR pass operates on lowered intrinsics so we need to call it after the inputs have been lowered. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `e25e17dd0c` ("intel/fs: clamp per vertex input accesses to patchControlPoints") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8912 Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22701>	2023-04-27 20:29:16 +00:00
Jordan Justen	fcb72ffd0c	intel/compiler/gfx12.5+: Lower 64-bit cluster_broadcast with 32-bit ops For MTL (verx10 == 125), float64 is supported, but int64 is not. Therefore we need to lower cluster broadcast using 32-bit int ops. For gfx12.5+ platforms that support int64, the register regions used by cluster broadcast aren't supported by the 64-bit pipeline. On MTL, dEQP-VK.subgroups.clustered._double and dEQP-VK.subgroups.clustered._dvec were failing to validate the compiled shader in debug mode, and reportedly gpu-hanging in release mode. With this change dEQP-VK.subgroups.clustered._double passed all 48 tests and dEQP-VK.subgroups.clustered._dvec passed all 140 tests on MTL. Rework: * Move from generator to brw_fs_lower_regioning.cpp. (Suggested by Francisco) * Apply to verx10 >= 125.. (Suggested by Francisco) Cc: 23.1 <mesa-stable> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> (v1) Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22569>	2023-04-20 11:41:10 -07:00
Lionel Landwerlin	d04d701cc6	intel/nir: add options to storage image lowering Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22302>	2023-04-18 08:38:55 +00:00
Lionel Landwerlin	08cf224c4a	intel/vec4: force exec_all on float control instruction Applying the same rule as the fs backend so that generation code doesn't assert. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `daa8003e45` ("intel/fs: use nomask for setting cr0 for float controls") Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22473>	2023-04-14 10:54:01 +00:00
Tapani Pälli	b967cbba57	intel/compiler: use intel_needs_workaround for Wa_14012437816 Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22437>	2023-04-13 07:33:50 +00:00
Tapani Pälli	ccf16693e1	intel/fs: use intel_needs_workaround for Wa_22013689345 Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22437>	2023-04-13 07:33:50 +00:00
Lionel Landwerlin	daa8003e45	intel/fs: use nomask for setting cr0 for float controls The instructions manipulation cr0 use the default mask on lane0. So if for some reason that lane is disabled in some of the dispatchs, we can end up not executing the instructions. Fixes flakyness in dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.uniform_float_32_to_16.uniform_matrix_float_rtz_frag Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22314>	2023-04-11 11:01:31 +00:00
Kenneth Graunke	98bcf650f1	intel/compiler: Use nir_dest_bit_size() for ballot bit size check There's no guarantee that this is a SSA value. Use the helper to handle both SSA values and register correctly. Otherwise we read trash when we encounter a register and make bad decisions on types, possibly leading to our destination being UQ typed when the VGRF is only 32-bit. Fixes compilation with -Dintel-clc=enabled since `7f6491b76d` (nir: Combine if_uses with instruction uses) but the bug is much older than that, circa 2017. We were just getting lucky before. Fixes: `069bf7c907` ("i965/fs: Match destination type to size for ballot") Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22374>	2023-04-07 19:28:56 -07:00
Alyssa Rosenzweig	7f6491b76d	nir: Combine if_uses with instruction uses Every nir_ssa_def is part of a chain of uses, implemented with doubly linked lists. That means each requires 2 * 64-bit = 16 bytes per def, which is memory intensive. Together they require 32 bytes per def. Not cool. To cut that memory use in half, we can combine the two linked lists into a single use list that contains both regular instruction uses and if-uses. To do this, we augment the nir_src with a boolean "is_if", and reimplement the abstract if-uses operations on top of that list. That boolean should fit into the padding already in nir_src so should not actually affect memory use, and in the future we sneak it into the bottom bit of a pointer. However, this creates a new inefficiency: now iterating over regular uses separate from if-uses is (nominally) more expensive. It turns out virtually every caller of nir_foreach_if_use(_safe) also calls nir_foreach_use(_safe) immediately before, so we rewrite most of the callers to instead call a new single `nir_foreach_use_including_if(_safe)` which predicates the logic based on `src->is_if`. This should mitigate the performance difference. There's a bit of churn, but this is largely a mechanical set of changes. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22343>	2023-04-07 23:48:03 +00:00
Ian Romanick	12e11fa3e4	intel/fs: White space fixes Trivial Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>	2023-04-06 19:07:50 +00:00
Ian Romanick	6dfb7061e0	intel/fs: Preserve meta data more often in brw_nir_move_interpolation_to_top This pass rarely makes any changes, so work a little harder to preserve more meta data. On my Ice Lake laptop (using a locked CPU speed and other measures to prevent thermal throttling, etc.) using a debugoptimized build, improves performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.spir'" by -0.2% ± 0.1% (n = 5, pooled s = 0.431885). v2: Add some parenthesis. Suggested by Lionel. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>	2023-04-06 19:07:50 +00:00
Ian Romanick	3037603b70	intel/fs: Linked list micro optimizations in brw_nir_move_interpolation_to_top Two linked list management changes: - Use the list head sentinel as the initial cursor. It is, after all, a proper node in the list. - Iterate the list of blocks starting with the second block instead of skipping the first block in the loop. On my Ice Lake laptop (using a locked CPU speed and other measures to prevent thermal throttling, etc.) using a release build, improves performance of compiling shaders from batman_arkham_city_goty.foz by -0.24% ± 0.09% (n = 5, pooled s = 0.324106). v2: Use nir_cursor instead of direct list manipultion. Suggested by Lionel. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>	2023-04-06 19:07:50 +00:00
Ian Romanick	78ee74de4a	intel/compiler: Micro optimize regions_overlap On my Ice Lake laptop (using a locked CPU speed and other measures to prevent thermal throttling, etc.) using a release build, improves performance of compiling shaders from batman_arkham_city_goty.foz by -1.09% ± 0.084% (n = 5, pooled s = 0.354471) Reduces the size of a release build by 26k. text data bss dec hex filename 23163641 400720 231360 23795721 16b1809 before/lib64/dri/iris_dri.so 23137264 400720 231360 23769344 16ab100 after/lib64/dri/iris_dri.so Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>	2023-04-06 19:07:50 +00:00
Ian Romanick	7873edee6e	intel/fs: Use specialized version of regions_overlap in opt_copy_propagation Since one of the register must always be either VGRF or FIXED_GRF, much of regions_overlap and reg_offset can be elided. On my Ice Lake laptop (using a locked CPU speed and other measures to prevent thermal throttling, etc.) using a debugoptimized build, improves performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.spir'" by -0.29% ± 0.097% (n = 5, pooled s = 0.361697). Using a release build, improves performance of compiling shaders from batman_arkham_city_goty.foz by -3.3% ± 0.04% (n = 5, pooled s = 0.178312). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>	2023-04-06 19:07:50 +00:00
Ian Romanick	43cb42df7c	intel/compiler: Micro optimize inst_is_in_block This function only exists in builds with assertions, so it only matters there. On my Ice Lake laptop (using a locked CPU speed and other measures to prevent thermal throttling, etc.) using a debugoptimized build, improves performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.spir'" by -5.2% ± 0.16% (n = 5, pooled s = 0.657887). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>	2023-04-06 19:07:50 +00:00
Ian Romanick	d47f521ee4	intel/compiler: Use NIR_PASS instead of NIR_PASS_V Reduce debug log spam by only logging the shader if a pass made some changes. This can also elide some nir_validate calls in debug builds. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>	2023-04-06 19:07:50 +00:00
Ian Romanick	fb950a9edf	intel/compiler: Remove one overload of backend_instruction::insert_before The version that takes a list of instructions is not used. I did not do any archaeology to find out when the last user was removed. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>	2023-04-06 19:07:50 +00:00
Emma Anholt	f1ea6c1b40	intel: Always call nir_lower_frexp. We have NIR lowering for Vulkan, and rely on GLSL's lowering in the frontend, but this will let us drop the GLSL lowering. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22083>	2023-04-06 02:32:01 +00:00
Jordan Justen	eef7a117a1	intel/compiler: Support fmul_fsign opt for fp64 when int64 isn't supported MTL support fp64, but not int64. The fsign(double(x))*FOO optimization would try to use a 64-bit int xor operation to conditionally toggle the sign bit off the result. Since this only affects high bit of the result, we can do a 32-bit move of the low dword, and a 32-bit xor on the high dword. Fixes dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp64.input_args.modf_denorm_flush_to_zero on MTL. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22259>	2023-04-05 18:48:21 +00:00
Lionel Landwerlin	e25aee8e34	intel/fs: also allow vec8+ vectorization of load_global_const_block_intel Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>	2023-04-05 12:32:56 +00:00
Lionel Landwerlin	a358b97c58	intel/fs: optimize uniform SSBO & shared loads Using divergence analysis, figure out when SSBO & shared memory loads are uniform and carry the data only once in register space. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>	2023-04-05 12:32:56 +00:00
Lionel Landwerlin	275ad509c1	intel/fs: factor out lsc surface descriptor settings Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>	2023-04-05 12:32:56 +00:00
Lionel Landwerlin	76698f3abd	intel/fs: copy instruction sources in logical send lowering Having references to inst->src[X] when you're also modifying inst->src[X] is a receipe for disaster. Making changes to the lowering code I've been bitten quite a few times by this take copies of all sources to do the lowering. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>	2023-04-05 12:32:56 +00:00
Lionel Landwerlin	adb8c30436	intel/fs: UNDEF fixup_nomask_control_flow temp register Ensure that the register's liveness is not expanded to loops. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>	2023-04-05 12:32:56 +00:00
Lionel Landwerlin	362a07db3a	intel/fs: don't consider fixup_nomask_control_flow SENDs predicate Those SENDs are still doing a full register write. We just inserted some predication for a workaround. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>	2023-04-05 12:32:56 +00:00
Lionel Landwerlin	34d8bfe65f	intel/fs: run VGRF compaction just before max live register accounting There are a number of instances of the dead code elimination pass that could reduce the count. For some reason this also seems to affect register allocation itself. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>	2023-04-05 12:32:56 +00:00
Ian Romanick	2016d9f46c	intel/fs: Rework the loop of opt_combine_constants that collects constants This is a bit more wordy, but it will greatly simplify some future changes. v2: Rebase on ADD3 changes. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22274>	2023-04-03 21:50:06 +00:00
Ian Romanick	9e4bb4bfcf	intel/fs: Refactor part of opt_combine_constants to a separate function Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22274>	2023-04-03 21:50:06 +00:00
Ian Romanick	593cde0432	intel/fs: Output opt_combine_constants debug to stderr It's a lot more useful to have it in the same stream with the INTEL_DEBUG=fs output. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22274>	2023-04-03 21:50:06 +00:00
Patrick Lerda	5d85966805	intel: fix memory leak related to brw_nir_create_passthrough_tcs() Indeed, the parameter "mem_ctx" was not processed. For instance, this issue is triggered with the crocus driver and "piglit/bin/shader_runner tests/spec/arb_tessellation_shader/execution/compatibility/tes-clip-vertex-different-from-position.shader_test -auto -fbo": SUMMARY: AddressSanitizer: 235216 byte(s) leaked in 48 allocation(s). Fixes: `96ba0344db` ("intel: Use common helpers for TCS passthrough shaders") Signed-off-by: Patrick Lerda <patrick9876@free.fr> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22173>	2023-03-30 10:52:07 +00:00
Ian Romanick	782de1932c	intel/fs: Don't copy propagate from saturate to sel There are already NIR algebraic optimizations (see also `ac6646129f` ("nir: Move fsat outside of fmin/fmax if second arg is 0 to 1.") that will try to remove the saturate from things like fmax(0.5, fsat(x)) This basically reverts `40aeb558ce` ("i965/fs: Allow propagation of instructions with saturate flag to sel"). That commit message had no shader-db information, so it's unclear whether this actually helped anything ever. No shader-db changes on any Intel platform. One shader in Far Cry New Dawn was affected. Cycles in all programs: 10933090738 -> 10933090736 (-0.0%) Cycles helped: 1 Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22169>	2023-03-29 23:48:19 +00:00

... 20 21 22 23 24 ...

3556 commits