fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-23 21:38:18 +02:00

Author	SHA1	Message	Date
Tapani Pälli	5b6718728b	intel/fs: implement Wa_14017989577 The first instruction of any kernel should have non-zero emask. This restriction needs to be obeyed to avoid GPU hangs. Patch adds a function to insert dummy mov as first instruction to make sure this requirement is fulfilled. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20194> (cherry picked from commit `bc4b7de0d0`)	2022-12-14 20:47:01 +00:00
Kenneth Graunke	d936394cf4	intel/compiler: Set NoMask on cr0 access for float controls mode This is trying to clear a bit in the control register. However, it's executing with whatever channel mask happens to be active. Typically this is the one at the start of the program, so at least some channels will be active. Typically the first channel will be active due to packed dispatch, but that's not always guaranteed. Without NoMask, the float controls writes may randomly not happen. Recent GPUs also seem to have a hang issue when the first instruction in the shader doesn't have any active channels. Having an instruction with NoMask at the start of the program works around the issue. See HSD bug 14017989577. In our case, the float controls preamble was breaking that restriction every time, causing us to run into this problem frequently. Thanks to Tapani Pälli for finding this hang issue, and Francisco Jerez and Lionel Landwerlin for helping pinpoint this issue during review of a workaround patch in !20194. Fixes GPU hangs in Elder Scrolls Online, Witcher 3, and likely more. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7639 Fixes: `9da56ffc52` ("i965/fs: add emit_shader_float_controls_execution_mode() and aux functions") Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20214> (cherry picked from commit `bafbe7c23a`)	2022-12-14 20:47:01 +00:00
Lionel Landwerlin	7b5ba2d363	intel: add missing restriction on fragment simd dispatch Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7755 Reviewed-by: Ivan Briano <ivan.briano@intel.com> Tested-by: Mark Janes <markjanes@swizzler.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20169> (cherry picked from commit `d4cd33630a`)	2022-12-14 20:47:01 +00:00
Lionel Landwerlin	e2fc0b33cd	intel: factor out dispatch PS enabling logic Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Tested-by: Mark Janes <markjanes@swizzler.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20169> (cherry picked from commit `b9403b1c47`)	2022-12-14 20:47:01 +00:00
Marcin Ślusarz	2615b5a354	intel/compiler: user payload starts after TUE header & its padding All data written by the user are offset by TUE header size. Without this patch we copy the correct amount of user data, but both "from" and "to" offsets are wrong. Fixes: `37e78803d7` ("intel/compiler: use nir_lower_task_shader pass") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19409> (cherry picked from commit `db0e6f9a07`)	2022-12-14 20:47:00 +00:00
Marcin Ślusarz	20ba98ab2f	intel/compiler: adjust [store\|load]_task_payload.base too Base also needs to be converted from bytes to words. Fixes: `c36ae42e4c` ("intel/compiler: Use nir_var_mem_task_payload") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19409> (cherry picked from commit `7aaafaa8ae`)	2022-12-14 20:47:00 +00:00
Lionel Landwerlin	ac303c5d5b	intel/fs: improve Wa_22013689345 workaround The initial implementation is a pretty big hammer. Implement the HW recommendation to minimize cases in which we need a fence. This improves by 10FPS on some of the Sascha Willems RT demos. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `6031ad4bf6` ("intel/fs: Add Wa_22013689345") Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19322> (cherry picked from commit `945637514e`)	2022-11-23 19:12:00 +00:00
Lionel Landwerlin	d567ac1dc8	intel/fs: put scratch surface in the surface state heap In `4ceaed7839` we made scratch surface state allocations part of the internal heap (mapped to STATE_BASE_ADDRESS::SurfaceStateBaseAddress) so that it doesn't uses slots in the application's expected 1M descriptors (especially with vkd3d-proton). But all our compiler code relies on BSS (STATE_BASE_ADDRESS::BindlessSurfaceStateBaseAddress). The additional issue is that there is only 26bits of surface offset available in CS instruction (CFE_STATE, 3DSTATE_VS, etc...) for scratch surfaces. So we need the drivers to put the scratch surfaces in the first chunk of STATE_BASE_ADDRESS::SurfaceStateBaseAddress (hence all the driver changes). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `4ceaed7839` ("anv: split internal surface states from descriptors") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7687 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19727> (cherry picked from commit `9c1c1888d9`)	2022-11-23 19:11:59 +00:00
Caio Oliveira	be102fede4	intel/compiler: Fix missing tie-breaker in brw_nir_analyze_ubo_ranges() ordering code Per Ken suggestion, use ascending order for the start offset. Fixes: `6d28c6e52c` ("i965: Select ranges of UBO data to be uploaded as push constants.") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19731> (cherry picked from commit `494e2edb90`)	2022-11-17 14:05:03 +00:00
Caio Oliveira	6e4a46e2a8	intel/compiler: Fix dynarray usage in intel_clc The code builds up the dynamic array of objects (spirv_objs) and collect pointers to each of them into another dynamic array (spirv_ptr_objs). If the growth of the first array cause a reallocation, it is possible that the previous pointers end up invalid. Fixes: `77e929a527` ("intel/clc: allow multiple CL files to be compiled together") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19730> (cherry picked from commit `9fd1d47aa0`)	2022-11-17 14:05:03 +00:00
Jason Ekstrand	7701bf1228	intel: Don't cross DWORD boundaries with byte scratch load/store The back-end swizzles dwords so that our indirect scratch messages match the memory layout of spill/fill messages for better cache coherency. The swizzle happens at a DWORD granularity. If a read or write crosses a DWORD boundary, the first bit will get correctly swizzled but whatever piece lands in the next dword will not because the scatter instructions assume sequential addresses for all bytes. For DWORD writes, this is handled naturally as part of scalarizing. For smaller writes, we need to be sure that a single write never escapes a dword. Fixes: `fd04f858b0` ("intel/nir: Don't try to emit vector load_scratch instructions") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7364 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19580> (cherry picked from commit `25c180b509`)	2022-11-09 21:22:06 +00:00
Jason Ekstrand	d580ab8898	intel/lower_mem_access_bit_sizes: Compute alignments automatically Because dup_mem_intrinsic() retains the SSA offset from the original intrinsic and only modifies it by adding a constant, we can compute the alignment based on the original alignment and the constant offset. This is both easier and more accurate. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19580> (cherry picked from commit `85685cf932`)	2022-11-09 21:22:06 +00:00
Ian Romanick	87e7794d7b	intel/fs: Fix constant propagation into 32x16 integer multiplication Don't copy propagate the constant in situations like mov(8) g8<1>D 0x7fffffffD mul(8) g16<1>D g8<8,8,1>D g15<16,8,2>W On platforms that only have a 32x16 multiplier, this will result in lowering the multiply to mul(8) g15<1>D g14<8,8,1>D 0xffffUW mul(8) g16<1>D g14<8,8,1>D 0x7fffUW add(8) g15.1<2>UW g15.1<16,8,2>UW g16<16,8,2>UW On Gfx8 and Gfx9, which have the full 32x32 multiplier, it results in mul(8) g16<1>D g15<16,8,2>W 0x7fffffffD Volume 2a of the Skylake PRM says: When multiplying a DW and any lower precision integer, the DW operand must on src0. See also https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/104. Previous to INTEL_shader_integer_functions2 (in Vulkan or OpenGL), I don't think it would be possible to create a situation where this could occur. I discovered this via some optimizations that can determine that the non-constant source must be able to fit in 16-bits. The case listed above came from piglit's "ext_transform_feedback-order arrays points" with those optimizations in place. No shader-db or fossil-db changes on any Intel platform. Fixes: `de6c0f8487` ("intel/fs: Implement support for NIR opcodes for INTEL_shader_integer_functions2") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718> (cherry picked from commit `db20412168`)	2022-11-09 21:22:06 +00:00
Illia Abernikhin	aa4ac5ff8b	utils: Merge util/debug.* into util/u_debug.* and remove util/debug.* Rename env_var_as_unsigned() -> debug_get_num_option(), because duplicate Rename env_var_as_bool() -> debug_get_bool_option(), because duplicate Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7177 Signed-off-by: Illia Abernikhin <illia.abernikhin@globallogic.com> Reviewed-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19336>	2022-11-02 07:25:39 +00:00
Kenneth Graunke	88756cee8d	intel/compiler: Run nir_opt_large_constants before scalarizing consts nir_opt_large_constants balks at seeing a store_deref of a variable where the source is a vecN operation of multiple load_consts, and thinks that isn't a constant, so it should not bother promoting it. Unfortunately, we were running nir_lower_load_const_to_scalar before nir_opt_large_constants, so this prevented a ton of constant promotion. This commit /used to help/ some shaders in shader-db. Presumably since !16770 landed, those shaders were already helped. Currently ther are no shader-db changes on any Intel platform. Fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 141998227 -> 141421756 (-0.4%) Instructions helped: 12515 Instructions hurt: 237 SENDs in all programs: 7437925 -> 7468033 (+0.4%) SENDs hurt: 12806 Cycles in all programs: 9161655753 -> 9132869800 (-0.3%) Cycles helped: 10163 Cycles hurt: 2637 Spills in all programs: 19977 -> 18678 (-6.5%) Spills helped: 384 Spills hurt: 40 Fills in all programs: 32863 -> 31396 (-4.5%) Fills helped: 385 Fills hurt: 42 Lost: 1 Lots of Shadow of the Tomb Raider fragment shaders and Batman Arkham Origins vertex shaders were hurt for SENDs in this commit. A couple Aztec Ruins compute shaders and Spaceship shaders (multiple stages) were also hurt. All of the shaders hurt for spills or fills were Spaceship compute shaders. Nearly all of the shaders helped were Shadow of the Tomb Raider fragmenet shaders. One Spaceship shader was reall, REALLY helped: Spills helped fossils/fossil-db/Spaceship.run.9f90a2a226fcc57f.1.foz/0b507d3abe2e3c28/compute: 321 -> 13 (-96.0%) Fills helped fossils/fossil-db/Spaceship.run.9f90a2a226fcc57f.1.foz/0b507d3abe2e3c28/compute: 279 -> 21 (-92.5%) Overall this seems like an improvement, but we may want to actually run these few benchmarks before landing. Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16539>	2022-11-01 14:55:21 -07:00
Lionel Landwerlin	920aed2121	intel/compiler: don't allocate compaction arrays on the stack Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7569 Cc: mesa-stable Reviewed-by: Luis Felipe Strano Moraes <luis.strano@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19339>	2022-10-28 07:10:58 +00:00
Lionel Landwerlin	e59c4a912b	intel/fs: use fs implementation of dump_instructions This specialized version prints out the liveness count as well as the maximum liveness count. It was eye opening when seeing the max liveness jump after lowering of packing instructions which should not have changed the count. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>	2022-10-27 21:05:00 +00:00
Lionel Landwerlin	e5dfff0946	intel/fs: reduce liveness of variables in lowering passes When lowering a single instruction with a destination VGRF to 2 or more, the VGRF is now considered partially written by each generated instruction and that increases its liveness especially in loops. Thus potentially increasing the number of spills/fills due to register allocation. Putting an UNDEF instruction in front of the lowered instructions allows the IR to limit the liveness of the VGRF, reducing register pressure. This has a pretty dramatic effect on spills/fills for RT shaders. Here the stats on Q2RTX shaders on DG2 (wipping out any spills/fills due to register allocation) : Instructions in all programs: 26150 -> 24955 (-4.6%) SENDs in all programs: 1148 -> 1148 (+0.0%) Loops in all programs: 4 -> 4 (+0.0%) Cycles in all programs: 392179 -> 332787 (-15.1%) Spills in all programs: 132 -> 116 (-12.1%) Fills in all programs: 262 -> 154 (-41.2%) Shader-db results on TGL : total instructions in shared programs: 21158140 -> 21158377 (<.01%) instructions in affected programs: 76629 -> 76866 (0.31%) helped: 18 HURT: 20 helped stats (abs) min: 1 max: 60 x̄: 18.89 x̃: 12 helped stats (rel) min: 0.21% max: 3.61% x̄: 1.02% x̃: 0.77% HURT stats (abs) min: 1 max: 79 x̄: 28.85 x̃: 18 HURT stats (rel) min: 0.04% max: 2.81% x̄: 1.13% x̃: 0.79% 95% mean confidence interval for instructions value: -4.82 17.30 95% mean confidence interval for instructions %-change: -0.34% 0.57% Inconclusive result (value mean confidence interval includes 0). total loops in shared programs: 5753 -> 5753 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 798856834 -> 798870688 (<.01%) cycles in affected programs: 6208395 -> 6222249 (0.22%) helped: 22 HURT: 17 helped stats (abs) min: 2 max: 8794 x̄: 1438.18 x̃: 782 helped stats (rel) min: 0.05% max: 2.28% x̄: 0.63% x̃: 0.44% HURT stats (abs) min: 2 max: 19178 x̄: 2676.12 x̃: 1358 HURT stats (rel) min: 0.04% max: 23.49% x̄: 2.25% x̃: 0.71% 95% mean confidence interval for cycles value: -952.19 1662.65 95% mean confidence interval for cycles %-change: -0.64% 1.90% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 4078 -> 4066 (-0.29%) spills in affected programs: 40 -> 28 (-30.00%) helped: 2 HURT: 0 total fills in shared programs: 2856 -> 2832 (-0.84%) fills in affected programs: 127 -> 103 (-18.90%) helped: 2 HURT: 0 total sends in shared programs: 998554 -> 998554 (0.00%) sends in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 0 GAINED: 0 Total CPU time (seconds): 2346.06 -> 2304.80 (-1.76%) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>	2022-10-27 21:05:00 +00:00
Lionel Landwerlin	dd6d40429b	intel/fs: make split_virtual_grfs deal with partial undefs v2: fix up UNDEFs instructions (Curro) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>	2022-10-27 21:05:00 +00:00
Lionel Landwerlin	14b99df7d9	intel/fs: require UNDEFs register offsets to be aligned to REG_SIZE Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>	2022-10-27 21:05:00 +00:00
Alyssa Rosenzweig	941c37c085	nir/lower_idiv: Remove imprecise_32bit_lowering NIR has two implementations of lower_idiv, keyed on the imprecise_32bit_lowering flag. This flag is misleading: the results when setting this flag "imprecise", they're completely wrong for some values. If a backend has a native implementation of umul_high, the correct path isn't that much more expensive. If it doesn't, it's substantially slower for highp integer divison... but in practice, non-constant highp integer division is pretty rare. After a painful migration of the tree, this code path has no more users. Remove it so nobody else gets the bright idea of using it again. Closes: #6555 Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19303>	2022-10-27 19:37:14 +00:00
Jordan Justen	c238699afa	intel/compiler: Broadcast lower code should check 64-bit int support This will affect MTL which will have fp64 support without int64 support. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Iván Briano <ivan.briano@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19284>	2022-10-27 09:22:09 +00:00
Lionel Landwerlin	2da7ec0db9	intel/clc: assert when libclc shader is not found Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7483 Reviewed-by: Luis Felipe Strano Moraes <luis.strano@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19091>	2022-10-27 08:53:55 +00:00
Tapani Pälli	1e51383258	intel/compiler: run nir_opt_idiv_const before nir_lower_idiv Integer div lowering can potentially create a lot of code that is not removed later on. Running const lowering pass first can be used to eliminate that code. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19157>	2022-10-20 15:35:48 +03:00
Luis Felipe Strano Moraes	eff1517cd7	anv: added proper handling for input argument in intel_clc That was previously listed on the getopt_long struct but not actually being used. This makes intel_clc argument processing easier as now all of its arguments are handled with getopt and anything after the special argument '--' is passed along to clang to form the final build command. Thanks to Dylan Baker for help with changes to the meson file. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19153>	2022-10-20 02:24:39 +00:00
Luis Felipe Strano Moraes	8de02ff980	anv: fixing typo on description of output flag for intel_clc Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19153>	2022-10-20 02:24:39 +00:00
Luis Felipe Strano Moraes	056d72c897	anv: add missing separator to help for intel_clc intel_clc relies on the special argument '--' for getopt to be given so it knows when to start expecting purely input files or clang arguments. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19153>	2022-10-20 02:24:39 +00:00
Luis Felipe Strano Moraes	8e1f03ada0	anv: reword info flag in intel_clc's getopt to avoid clash The info keyword was using the same short description that was listed for input files on the struct for long_options. Rewording it to 'v' and 'verbose' to be more in line with expectations. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19153>	2022-10-20 02:24:39 +00:00
Iván Briano	d9747169b6	anv: support VK_PIPELINE_CREATE_RAY_TRACING_SKIP_* VK_PIPELINE_CREATE_RAY_TRACING_SKIP_AABBS_BIT_KHR and VK_PIPELINE_CREATE_RAY_TRACING_SKIP_TRIANGLES_BIT_KHR, when specified, make TraceRay behave as if the corresponding shader flags were set, but without affecting the value of IncomingRayFlags in shaders. v2 (Lionel): Improve comments Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19152>	2022-10-20 00:03:55 +00:00
Rohan Garg	43169dbbe5	intel/compiler: Support 16 bit float ops Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17988>	2022-10-17 15:56:28 +02:00
Kenneth Graunke	2dfab687ec	intel/compiler: Vectorize gl_TessLevelInner/Outer[] writes [v2] Setting the NIR options takes care of iris thanks to the common st/mesa linking code, and updating brw_nir_link_shaders should handle anv. The main effort here is updating remap_tess_levels, which needs to handle vector stores, writemasking, and swizzling. Unfortunately, we also need to continue handling the existing single-component access because it's used for TES inputs, which we don't vectorize. We could try to vectorize TES inputs too, but they're all pushed anyway, so it wouldn't buy us much other than deleting this code. Also, we do have opt_combine_stores, but not one for loads. One limitation of using nir_vectorize_tess_levels is that it works on variables, and so isn't able to combine outer/inner writes that happen to live in the same vec4 slot (for triangle domains). That said, it's still better than before. For writes, we allow the intrinsics to supply up to the full size of the variable (vec4 for outer, vec2 for inner) even if the domain only requires a subset of those components (i.e. triangles needs 3). shader-db results on Icelake: total instructions in shared programs: 19600314 -> 19597528 (-0.01%) instructions in affected programs: 65338 -> 62552 (-4.26%) helped: 271 / HURT: 0 helped stats (abs) min: 6 max: 24 x̄: 10.28 x̃: 12 helped stats (rel) min: 1.30% max: 18.18% x̄: 5.80% x̃: 7.59% 95% mean confidence interval for instructions value: -10.71 -9.85 95% mean confidence interval for instructions %-change: -6.17% -5.43% Instructions are helped. total cycles in shared programs: 851842332 -> 851808165 (<.01%) cycles in affected programs: 618577 -> 584410 (-5.52%) helped: 271 / HURT: 0 helped stats (abs) min: 64 max: 540 x̄: 126.08 x̃: 111 helped stats (rel) min: 2.57% max: 37.97% x̄: 6.12% x̃: 5.06% 95% mean confidence interval for cycles value: -135.35 -116.80 95% mean confidence interval for cycles %-change: -6.67% -5.57% Cycles are helped. total sends in shared programs: 1025238 -> 1024308 (-0.09%) sends in affected programs: 6454 -> 5524 (-14.41%) helped: 271 / HURT: 0 helped stats (abs) min: 2 max: 8 x̄: 3.43 x̃: 4 helped stats (rel) min: 5.71% max: 25.00% x̄: 14.98% x̃: 17.39% 95% mean confidence interval for sends value: -3.57 -3.29 95% mean confidence interval for sends %-change: -15.42% -14.54% Sends are helped. According to Felix DeGrood, this results in a 10% improvement in the draw call time for certain draw calls from Strange Brigade. v2: Fix assertions about number of components and add more of them. Combine the quads and triangles handling as it's nearly identical. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v1] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19061>	2022-10-13 11:38:21 -07:00
Tapani Pälli	c9c9a5b78d	intel/fs: mark debug variables with ASSERTED To clean up compilation warnings about unused variables when asserts are disabled. v2: UNUSED -> ASSERTED (Eric Engestrom) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19016>	2022-10-11 04:14:30 +00:00
Caio Oliveira	6cda887ac6	intel/compiler: Explicitly include build-id when linking intel_clc Ensure that the program will have a build-id. Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18924>	2022-10-01 16:42:07 +00:00
Kenneth Graunke	b61b1d5a4c	Revert "intel/compiler: Vectorize gl_TessLevelInner/Outer[] writes" This reverts commit `abba55382f`. The assertions I added late in the process broke shader-db, and my quick fix broke CI, so let's just revert it for now and I'll resubmit this later when it's working better. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7385 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18895>	2022-09-29 17:39:18 -07:00
Marcin Ślusarz	9bac88856d	intel/compiler: fix loading of draw_id from task & mesh payload Previously both destination and source were floats, so no casting was performed, but with `7664c85b1d` source register was reinterpreted as unsigned integer, so MOV started casting that integer to float. Fixes: `7664c85b1d` ("intel/compiler: Create and use struct for TASK and MESH thread payloads") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18886>	2022-09-29 17:17:25 +00:00
Lionel Landwerlin	23c7142cd6	anv: disable SIMD16 for RT shaders Since divergence is a lot more likely in RT than compute, it makes sense to limit ourselves to SIMD8. The trampoline shader defaults to SIMD16 since this one is uniform. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>	2022-09-28 05:38:37 +00:00
Lionel Landwerlin	8fc7a98e31	intel/fs: disable split_array_vars on opencl kernels Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>	2022-09-28 05:38:36 +00:00
Lionel Landwerlin	57593c5395	intel/nir: disable assert on async stack id This can be accessed from : - RT shaders - CS trampoline shader We missed the second part here. Fixes: `0465714790` ("intel/nir/rt: add more helpers for ray queries") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>	2022-09-28 05:38:36 +00:00
Lionel Landwerlin	8d580de4a9	intel/nir: fix potential invalid function impl ptr usage We keep the nir_builder::impl value around, but we've run some passes that might have change the main function. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `96fde5518b` ("intel/rt: Add a helper to create the raygen trampoline shader") Acked-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>	2022-09-28 05:38:36 +00:00
Lionel Landwerlin	1ffd28149f	intel/nir: fixup preserved metadata in rayquery lowering Another case of not clearing the metadata correctly. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `c78be5da30` ("intel/fs: lower ray query intrinsics") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>	2022-09-28 05:38:36 +00:00
Lionel Landwerlin	9dba8d8aa1	intel/fs: take a builder arg for resolve_source_modifiers() There will be situations where we will want to use a local builder rather than the one associated with NIR->backend translation. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>	2022-09-28 05:38:36 +00:00
Lionel Landwerlin	649cdc617f	intel/nir: reuse rt helper Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>	2022-09-28 05:38:36 +00:00
Lionel Landwerlin	57f1e95102	intel/rt: fix procedural primitive ID access Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>	2022-09-28 05:38:36 +00:00
Jason Ekstrand	aea88f16df	intel/fs: SEL_EXEC uses the integer pipe for 64-bit stuff Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>	2022-09-28 05:38:36 +00:00
Jason Ekstrand	c80c0ed943	intel/fs: Always use integer types for indirect MOVs There's a new Gen12.5 restriction which forbids using the VxH or Vx1 on the floating-point pipe. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>	2022-09-28 05:38:36 +00:00
Lionel Landwerlin	c6a7f4b34e	intel/devinfo: Rename & implement num_dual_subslices v2: Use the upper bound of dual subslices as the ID is not remapped with fused off parts and this is what we'll use for a bunch of computation in RT. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>	2022-09-28 05:38:36 +00:00
Kenneth Graunke	abba55382f	intel/compiler: Vectorize gl_TessLevelInner/Outer[] writes Setting the NIR options takes care of iris thanks to the common st/mesa linking code, and updating brw_nir_link_shaders should handle anv. The main effort here is updating remap_tess_levels, which needs to handle vector stores, writemasking, and swizzling. Unfortunately, we also need to continue handling the existing single-component access because it's used for TES inputs, which we don't vectorize. We could try to vectorize TES inputs too, but they're all pushed anyway, so it wouldn't buy us much other than deleting this code. Also, we do have opt_combine_stores, but not one for loads. One limitation of using nir_vectorize_tess_levels is that it works on variables, and so isn't able to combine outer/inner writes that happen to live in the same vec4 slot (for triangle domains). That said, it's still better than before. For writes, we allow the intrinsics to supply up to the full size of the variable (vec4 for outer, vec2 for inner) even if the domain only requires a subset of those components (i.e. triangles needs 3). shader-db results on Icelake: total instructions in shared programs: 19605070 -> 19602284 (-0.01%) instructions in affected programs: 65338 -> 62552 (-4.26%) helped: 271 / HURT: 0 helped stats (abs) min: 6 max: 24 x̄: 10.28 x̃: 12 helped stats (rel) min: 1.30% max: 18.18% x̄: 5.80% x̃: 7.59% 95% mean confidence interval for instructions value: -10.71 -9.85 95% mean confidence interval for instructions %-change: -6.17% -5.43% Instructions are helped. total cycles in shared programs: 851854659 -> 851820320 (<.01%) cycles in affected programs: 618749 -> 584410 (-5.55%) helped: 271 / HURT: 0 helped stats (abs) min: 69 max: 540 x̄: 126.71 x̃: 108 helped stats (rel) min: 2.57% max: 37.97% x̄: 6.17% x̃: 5.06% 95% mean confidence interval for cycles value: -135.89 -117.54 95% mean confidence interval for cycles %-change: -6.72% -5.63% Cycles are helped. total sends in shared programs: 1025285 -> 1024355 (-0.09%) sends in affected programs: 6454 -> 5524 (-14.41%) helped: 271 / HURT: 0 helped stats (abs) min: 2 max: 8 x̄: 3.43 x̃: 4 helped stats (rel) min: 5.71% max: 25.00% x̄: 14.98% x̃: 17.39% 95% mean confidence interval for sends value: -3.57 -3.29 95% mean confidence interval for sends %-change: -15.42% -14.54% Sends are helped. According to Felix DeGrood, this results in a 10% improvement in the draw call time for certain draw calls from Strange Brigade. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>	2022-09-27 18:17:56 -07:00
Kenneth Graunke	be21d54aca	intel/compiler: Use an existing URB write to end TCS threads when viable VS, TCS, TES, and GS threads must end with a URB write message with the EOT (end of thread) bit set. For VS and TES, we shadow output variables with temporaries and perform all stores at the end of the shader, giving us an existing message to do the EOT. In tessellation control shaders, we don't defer output stores until the end of the thread like we do for vertex or evaluation shaders. We just process store_output and store_per_vertex_output intrinsics where they occur, which may be in control flow. So we can't guarantee that there's a URB write being at the end of the shader. Traditionally, we've just emitted a separate URB write to finish TCS threads, doing a writemasked write to an single patch header DWord. On Broadwell, we need to set a "TR DS Cache Disable" bit, so this is a convenient spot to do so. But on other platforms, there's no such field, and this write is purely wasteful. Insetad of emitting a separate write, we can just look for an existing URB write at the end of the program and tag that with EOT, if possible. We already had code to do this for geometry shaders, so just lift it into a helper function and reuse it. No changes in shader-db. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>	2022-09-27 18:17:42 -07:00
Lionel Landwerlin	e76e3d9cea	intel/nir/rt: fixup alignment of memcpy iterations Not sure if fixes anything because it's always 16 at least, but this is more correct. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>	2022-09-23 08:29:17 +00:00
Lionel Landwerlin	139e8f4635	intel/fs: fixup a64 messages And run algebraic when either int64 for float64 are not supported so those don't end up in the generated code. Cc: mesa-stable Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>	2022-09-23 08:29:17 +00:00

1 2 3 4 5 ...

2276 commits