Commit graph

21 commits

Author SHA1 Message Date
Jason Ekstrand
e20e85f01e nir: Make nir_ssa_def_rewrite_uses_after take an SSA value
This replaces the new_src parameter of nir_ssa_def_rewrite_uses_after()
with an SSA def, and rewrites all the users as needed.

Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>
2021-03-08 16:59:55 +00:00
Jason Ekstrand
117668b811 nir: Make nir_ssa_def_rewrite_uses take an SSA value
This commit replaces the new_src parameter of nir_ssa_def_rewrite_uses()
with an SSA def, removes nir_ssa_def_rewrite_uses_ssa(), and rewrites
all the users as needed.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>
2021-03-08 16:59:55 +00:00
Jason Ekstrand
fe18a0fd45 intel/nir: Lower load_num_work_groups to 32-bit if needed
For OpenCL-style kernels, this builtin is 64-bit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6570>
2020-09-02 20:38:22 +00:00
Jason Ekstrand
003b04e266 intel/compiler: Allow MESA_SHADER_KERNEL
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6280>
2020-08-12 10:11:06 +00:00
Jason Ekstrand
8e1de8e5ac intel/cs_intrinsics: Handle 64-bit intrinsics
It's safe to do the math in 32 bits because they're all local workgroup
calculations.  We just need to do a conversion at the end.  For a couple
of intrinsics, we just turn them into 32-bit intrinsics and add a u2u64.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6280>
2020-08-12 10:11:06 +00:00
Caio Marcelo de Oliveira Filho
2a308ee4c7 intel/fs: Remove unused state from brw_nir_lower_cs_intrinsics
After 2663759af0 ("intel/fs: Add and use a new load_simd_width_intel
intrinsic") the local_workgroup_size is not used anymore except for
assertions at the pass' start, so drop it from state struct.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5213>
2020-05-26 20:35:03 +00:00
Caio Marcelo de Oliveira Filho
2663759af0 intel/fs: Add and use a new load_simd_width_intel intrinsic
Intrinsic to get the SIMD width, which not always the same as subgroup
size.  Starting with a small scope (Intel), but we can rename it later
to generalize if this turns out useful for other drivers.

Change brw_nir_lower_cs_intrinsics() to use this intrinsic instead of
a width will be passed as argument.  The pass also used to optimized
load_subgroup_id for the case that the workgroup fitted into a single
thread (it will be constant zero).  This optimization moved together
with lowering of the SIMD.

This is a preparation for letting the drivers call it before the
brw_compile_cs() step.

No shader-db changes in BDW, SKL, ICL and TGL.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>
2020-05-01 12:50:37 -07:00
Plamena Manolova
c77dc51203 intel/compiler: Add support for variable workgroup size
Add new builtin parameters that are used to keep track of the group
size.  This will be used to implement ARB_compute_variable_group_size.

The compiler will use the maximum group size supported to pick a
suitable SIMD variant.  A later improvement will be to keep all SIMD
variants (like FS) so the driver can select the best one at dispatch
time.

When variable workgroup size is used, the small workgroup optimization
is disabled as it we can't prove at compile time that the barriers
won't be needed.

Extracted from original i965 patch with additional changes by
Caio Marcelo de Oliveira Filho.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>
2020-04-09 19:23:12 -07:00
Jason Ekstrand
bd3ab75aef intel/nir: Stop adding redundant barriers
Now that both GLSL and SPIR-V are adding shared and tcs_patch barriers
(as appropreate) prior to the nir_intrinsic_barrier, we don't need to do
it ourselves in the back-end.  This reverts commit
26e950a5de01564e3b5f2148ae994454ae5205fe.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13 17:23:47 +00:00
Jason Ekstrand
803fad43c3 intel/nir: Add a memory barrier before barrier()
Our barrier instruction does not implicitly do a memory fence but the
GLSL barrier() intrinsic is supposed to.  The easiest back-portable
solution is to just add the NIR barriers.  We'll sort this out more
properly in later commits.

Cc: mesa-stable@lists.freedesktop.org
Closes: #2138
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2020-01-07 21:52:19 -06:00
Caio Marcelo de Oliveira Filho
0425b34b79 intel/fs: Don't loop when lowering CS intrinsics
This was needed when certain intrinsics were lowered to other ones
that were defined by the same pass.  After 060817b2 "intel,nir: Move
gl_LocalInvocationID lowering to nir_lower_system_values" we don't
need the loop anymore.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho
3ee3024804 intel/fs: Add support for CS to group invocations in quads
When using quads, instead of mapping the elements to the next 4 local
invocation indices, we map the two next in the "current" row and two
next in the "next row".  A side effect is that a thread will execute
the indices in a different order.

We now perform the lowering of both local invocation ID and index
together -- and don't rely anymore on lowering done by
nir_lower_system_values.  That is convenient when doing the math for
quads, because we need X and Y to get the right invocation index.

When the pass progresses, fold the constants and clean up to reduce
the noise from the indexing math.

This implements the derivative_group_quadsNV semantics from
NV_compute_shader_derivatives.

v2: Take subgroup_id into account, otherwise only values in the first
    subgroup would be used. (Jason)

v3: Calculate invocation index and ID together, to avoid duplicating
    some math in the quads case when both index and ID are used. (Jason)

v4: Don't call cleanup passes as part of the lowering, let that to the
    call site. (Jason)
    Change calculation to use less instructions. (Jason)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v3)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:33 -07:00
Jason Ekstrand
060817b2fa intel,nir: Move gl_LocalInvocationID lowering to nir_lower_system_values
It's not at all intel-specific; the formula is dictated by OpenGL and
Vulkan.  The only intel-specific thing is that we need the lowering.  As
a nice side-effect, the new version is variable-group-size ready.

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2018-11-19 09:57:41 -06:00
Jason Ekstrand
974daec495 i965/fs: Implement basic SPIR-V subgroup intrinsics
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-03-07 12:13:47 -08:00
Jason Ekstrand
295605c930 intel/cs: Push subgroup ID instead of base thread ID
We're going to want subgroup ID for SPIR-V subgroups eventually anyway.
We really only want to push one and calculate the other from it.  It
makes a bit more sense to push the subgroup ID because it's simpler to
calculate and because it's a real API thing.  The only advantage to
pushing the base thread ID is to avoid a single SHL in the shader.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand
80ddfab2f5 intel/cs: Rework the way thread local ID is handled
Previously, brw_nir_lower_intrinsics added the param and then emitted a
load_uniform intrinsic to load it directly.  This commit switches things
over to use a specific NIR intrinsic for the thread id.  The one thing I
don't like about this approach is that we have to copy thread_local_id
over to the new visitor in import_uniforms.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jordan Justen
b35e8c3b86 intel/nir: Zero local index const struct for valgrind & nir_serialize
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-10-25 12:36:21 -07:00
Jason Ekstrand
59fb59ad54 nir: Get rid of nir_shader::stage
It's redundant with nir_shader::info::stage.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-10-20 12:49:17 -07:00
Jason Ekstrand
79d403417c intel/cs: Make thread_local_id a regular builtin param
This is a lot more natural than special casing it all over the place.
We still have to do a bit of special-casing in assign_constant_locations
but it's not special-cased quite as bad as it was before.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-10-12 22:39:31 -07:00
Jason Ekstrand
6bcc5c0c75 intel/cs: Grow prog_data::param on-demand for thread_local_id_index
Instead of making the caller of brw_compile_cs add something to the
param array for thread_local_id_index, just add it on-demand in
brw_nir_intrinsics and grow the array.  This is now safe to do because
everyone is now using ralloc for prog_data::param.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-10-12 22:39:30 -07:00
Jason Ekstrand
b1d1b7222a intel/compiler: Make brw_nir_lower_intrinsics compute-specific
It's already only ever called from brw_compile_cs and only handles
compute intrinsics.  Let's just make it CS-specific.  We can always
make it handle other stages again later if we want.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-10-12 22:39:30 -07:00
Renamed from src/intel/compiler/brw_nir_intrinsics.c (Browse further)