fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-23 00:20:09 +01:00

Author	SHA1	Message	Date
Jason Ekstrand	e20e85f01e	nir: Make nir_ssa_def_rewrite_uses_after take an SSA value This replaces the new_src parameter of nir_ssa_def_rewrite_uses_after() with an SSA def, and rewrites all the users as needed. Acked-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>	2021-03-08 16:59:55 +00:00
Jason Ekstrand	117668b811	nir: Make nir_ssa_def_rewrite_uses take an SSA value This commit replaces the new_src parameter of nir_ssa_def_rewrite_uses() with an SSA def, removes nir_ssa_def_rewrite_uses_ssa(), and rewrites all the users as needed. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa@collabora.com> Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>	2021-03-08 16:59:55 +00:00
Jason Ekstrand	fe18a0fd45	intel/nir: Lower load_num_work_groups to 32-bit if needed For OpenCL-style kernels, this builtin is 64-bit. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6570>	2020-09-02 20:38:22 +00:00
Jason Ekstrand	003b04e266	intel/compiler: Allow MESA_SHADER_KERNEL Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6280>	2020-08-12 10:11:06 +00:00
Jason Ekstrand	8e1de8e5ac	intel/cs_intrinsics: Handle 64-bit intrinsics It's safe to do the math in 32 bits because they're all local workgroup calculations. We just need to do a conversion at the end. For a couple of intrinsics, we just turn them into 32-bit intrinsics and add a u2u64. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6280>	2020-08-12 10:11:06 +00:00
Caio Marcelo de Oliveira Filho	2a308ee4c7	intel/fs: Remove unused state from brw_nir_lower_cs_intrinsics After `2663759af0` ("intel/fs: Add and use a new load_simd_width_intel intrinsic") the local_workgroup_size is not used anymore except for assertions at the pass' start, so drop it from state struct. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5213>	2020-05-26 20:35:03 +00:00
Caio Marcelo de Oliveira Filho	2663759af0	intel/fs: Add and use a new load_simd_width_intel intrinsic Intrinsic to get the SIMD width, which not always the same as subgroup size. Starting with a small scope (Intel), but we can rename it later to generalize if this turns out useful for other drivers. Change brw_nir_lower_cs_intrinsics() to use this intrinsic instead of a width will be passed as argument. The pass also used to optimized load_subgroup_id for the case that the workgroup fitted into a single thread (it will be constant zero). This optimization moved together with lowering of the SIMD. This is a preparation for letting the drivers call it before the brw_compile_cs() step. No shader-db changes in BDW, SKL, ICL and TGL. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>	2020-05-01 12:50:37 -07:00
Plamena Manolova	c77dc51203	intel/compiler: Add support for variable workgroup size Add new builtin parameters that are used to keep track of the group size. This will be used to implement ARB_compute_variable_group_size. The compiler will use the maximum group size supported to pick a suitable SIMD variant. A later improvement will be to keep all SIMD variants (like FS) so the driver can select the best one at dispatch time. When variable workgroup size is used, the small workgroup optimization is disabled as it we can't prove at compile time that the barriers won't be needed. Extracted from original i965 patch with additional changes by Caio Marcelo de Oliveira Filho. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>	2020-04-09 19:23:12 -07:00
Jason Ekstrand	bd3ab75aef	intel/nir: Stop adding redundant barriers Now that both GLSL and SPIR-V are adding shared and tcs_patch barriers (as appropreate) prior to the nir_intrinsic_barrier, we don't need to do it ourselves in the back-end. This reverts commit 26e950a5de01564e3b5f2148ae994454ae5205fe. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>	2020-01-13 17:23:47 +00:00
Jason Ekstrand	803fad43c3	intel/nir: Add a memory barrier before barrier() Our barrier instruction does not implicitly do a memory fence but the GLSL barrier() intrinsic is supposed to. The easiest back-portable solution is to just add the NIR barriers. We'll sort this out more properly in later commits. Cc: mesa-stable@lists.freedesktop.org Closes: #2138 Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2020-01-07 21:52:19 -06:00
Caio Marcelo de Oliveira Filho	0425b34b79	intel/fs: Don't loop when lowering CS intrinsics This was needed when certain intrinsics were lowered to other ones that were defined by the same pass. After `060817b2` "intel,nir: Move gl_LocalInvocationID lowering to nir_lower_system_values" we don't need the loop anymore. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	3ee3024804	intel/fs: Add support for CS to group invocations in quads When using quads, instead of mapping the elements to the next 4 local invocation indices, we map the two next in the "current" row and two next in the "next row". A side effect is that a thread will execute the indices in a different order. We now perform the lowering of both local invocation ID and index together -- and don't rely anymore on lowering done by nir_lower_system_values. That is convenient when doing the math for quads, because we need X and Y to get the right invocation index. When the pass progresses, fold the constants and clean up to reduce the noise from the indexing math. This implements the derivative_group_quadsNV semantics from NV_compute_shader_derivatives. v2: Take subgroup_id into account, otherwise only values in the first subgroup would be used. (Jason) v3: Calculate invocation index and ID together, to avoid duplicating some math in the quads case when both index and ID are used. (Jason) v4: Don't call cleanup passes as part of the lowering, let that to the call site. (Jason) Change calculation to use less instructions. (Jason) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v3) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Jason Ekstrand	060817b2fa	intel,nir: Move gl_LocalInvocationID lowering to nir_lower_system_values It's not at all intel-specific; the formula is dictated by OpenGL and Vulkan. The only intel-specific thing is that we need the lowering. As a nice side-effect, the new version is variable-group-size ready. Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>	2018-11-19 09:57:41 -06:00
Jason Ekstrand	974daec495	i965/fs: Implement basic SPIR-V subgroup intrinsics Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-03-07 12:13:47 -08:00
Jason Ekstrand	295605c930	intel/cs: Push subgroup ID instead of base thread ID We're going to want subgroup ID for SPIR-V subgroups eventually anyway. We really only want to push one and calculate the other from it. It makes a bit more sense to push the subgroup ID because it's simpler to calculate and because it's a real API thing. The only advantage to pushing the base thread ID is to avoid a single SHL in the shader. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	80ddfab2f5	intel/cs: Rework the way thread local ID is handled Previously, brw_nir_lower_intrinsics added the param and then emitted a load_uniform intrinsic to load it directly. This commit switches things over to use a specific NIR intrinsic for the thread id. The one thing I don't like about this approach is that we have to copy thread_local_id over to the new visitor in import_uniforms. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jordan Justen	b35e8c3b86	intel/nir: Zero local index const struct for valgrind & nir_serialize Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-25 12:36:21 -07:00
Jason Ekstrand	59fb59ad54	nir: Get rid of nir_shader::stage It's redundant with nir_shader::info::stage. Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-10-20 12:49:17 -07:00
Jason Ekstrand	79d403417c	intel/cs: Make thread_local_id a regular builtin param This is a lot more natural than special casing it all over the place. We still have to do a bit of special-casing in assign_constant_locations but it's not special-cased quite as bad as it was before. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:31 -07:00
Jason Ekstrand	6bcc5c0c75	intel/cs: Grow prog_data::param on-demand for thread_local_id_index Instead of making the caller of brw_compile_cs add something to the param array for thread_local_id_index, just add it on-demand in brw_nir_intrinsics and grow the array. This is now safe to do because everyone is now using ralloc for prog_data::param. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:30 -07:00
Jason Ekstrand	b1d1b7222a	intel/compiler: Make brw_nir_lower_intrinsics compute-specific It's already only ever called from brw_compile_cs and only handles compute intrinsics. Let's just make it CS-specific. We can always make it handle other stages again later if we want. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:30 -07:00

21 commits