The intel backend compiler is not dealing with the scratch loads
emitted by this pass very well. There are 2 reasons for this :
- all loads are at the top of the shader
- the loads are global load intrinsics (cannot be differentiated
from ssbo loads for example)
This leads the backend to generate ridiculous amount of spills.
To help a bit (actually quite a lot), we can move the scratch loads in
the blocks where they're needed, using the dominance information.
Quite often that also ends up moving loads in a block that might not
be reached by all the lanes, so we're potentially avoiding some loads.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
The previous pass shrinking values stored on the stack might have left
some gaps on the stack (a vec4 turned into a vec3 for instance).
This pass reorders variables on the stack, by component bit size and
by ssa value number. The component size is useful to pack smaller
values together. The ssa value number is also important because if we
have 2 calls spilling the same values, then we can avoid reemiting the
spillings if the values are stored in the same location.
v2: Remove unused sorting function (Konstantin)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
For example, if we store to scratch a vec4 but only a subset of
components are used after the load operation.
v2: Use nir_intrinsic_write_mask (Konstantin)
Use u_foreach_bit() instead of u_bit_scan() (Konstantin)
Fix mask building loop (Konstantin)
v3: Fix reswizzle (Konstantin)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
Previously when considering whether to rematerialize or spill/fill
ssa_1954, we would go for a spill/fill :
vec4 32 ssa_388 = (float32)txf ssa_387 (texture_handle), ssa_86 (coord), ssa_23 (lod), 0 (texture), 0 (sampler)
...
vec1 32 ssa_1953 = load_const (0xbd23d70a = -0.040000)
vec1 32 ssa_1954 = fadd ssa_388.x, ssa_1953
vec1 32 ssa_1955 = fneg ssa_1954
This is because when looking at ssa_1955 the first time, we would
consider ssa_388 unrematerialiable, and therefore all values built on
top of it would be considered unrematerialiable as well.
The missing piece when considering whether to rematerialize ssa_1954
is that we should look at filled values. Now that ssa_388 has been
spilled/filled, we can rebuild ssa_1955 on top of the filled value and
avoid spilling/filling ssa_1955 at all.
This requires a bit more work though. We can't just look at an
instruction in isolation, we need to go through the ssa chains until
we find values we can rematerialize or not.
In this change we build a list of all ssa values involved in building
a given value, up to the point there we find a filled or a
rematerializable value.
In this particular case, looking at ssa_1955 :
* We can rematerialize ssa_388 from its filled value
* We can rematerialize ssa_1953 trivially
* We can rematerialize ssa_1954 because its 2 inputs are rematerializable
* We can rematerialize ssa_1955 because ssa_1954 is rematerializable
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
Currently we do something like this :
ssa_0 = ...
ssa_1 = ...
* spill ssa_0, ssa_1
call1()
* fill ssa_0, ssa_1
ssa_2 = ...
ssa_3 = ...
* spill ssa_0, ssa_1, ssa_2, ssa_3
call2()
* fill ssa_0, ssa_1, ssa_2, ssa_3
If we assign the same possition to ssa_0 & ssa_1 in the spilling
stack, then on call2(), we know that those values are already present
in memory at the right location and we can avoid respilling them.
The result would be something like this :
ssa_0 = ...
ssa_1 = ...
* spill ssa_0, ssa_1
call1()
* fill ssa_0, ssa_1
ssa_2 = ...
ssa_3 = ...
* spill ssa_2, ssa_3
call2()
* fill ssa_0, ssa_1, ssa_2, ssa_3
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
For a follow up optimization, we would like to track scratch loads.
This isn't possible with global load/store intrinsics. So use a couple
of special intrinsic in the pass and only lower it to global
intrinsics at the end.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
When moving code into the main block or loop blocks, put the code into
its own :
if(true) { ... }
block so that we avoid break/continue/return issues.
v2: Also take care of the main block with return instructions
v3: Make deletion more obvious with dummy if blocks (Jason)
v4: Fixup assert for loops (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 8dfb240b1f ("nir: Add raytracing shader call lowering pass.")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16036>
When moving code from below to the insertion cursor point, if the
cursor points to a jump instruction, don't bother inserting the code.
It would break the break/continue assumptions of NIR and would not be
executed anyway.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 8dfb240b1f ("nir: Add raytracing shader call lowering pass.")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16036>
Stop using nop instructions which are causing issues with
break/continue, instead use a nir_cursor (which brings its share of
pain).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 8dfb240b1f ("nir: Add raytracing shader call lowering pass.")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16036>
Helpful when lost in a sea of NIR :)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15887>
v2: Add helper for acceleration->root_node computation (Caio)
v3: Update comment on "done" bit (Caio)
Remove progress bool value for impl function (Caio)
Don't use nir_shader_instructions_pass to search the shader (Caio)
v4: Rename variable for if/else block (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
This will allow to reuse the same intrinsic for various topology based
ID.
v2: fix intrinsic comment (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
Replace load_mesh_global_arg_addr_intel with a more general intrinsic
load_mesh_inline_data_intel, since inline data now hold both
a pointer descriptor information and the first few push constants.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14788>
Static struct bitset was renamed to brw_bitset as a struct bitset
is defined in sys/_bitset.h included by pthread_np.h on FreeBSD that
is indirectly included by src/intel/compiler/brw_nir_lower_shader_calls.c
Signed-off-by: Eleni Maria Stea <elene.mst@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11203>
Really copying Jason's pass.
Changes:
- Instead of all the intel lowering introduce rt_{execute_callable,trace_ray,resume}
- Add the ability to use scratch intrinsics directly.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10339>