Some fields in schedule_node will need to be reset each time they are
used. The `cand_generation` needs to be back to zero, and both
`unblocked_time` and `parent_count` need to be back to their initial
values, which were pre-calculated.
Rename the initial data fields and add new ones for the temporary data.
Note the helper function is `per node` to allow it "tag along" with an
existing loops.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
This will be useful later on when we reuse the same scheduler for
multiple modes.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
- Just assert in functions we expect it to exist
- Predicate usage with `!post_reg_alloc` to avoid suggest there are more
combinations.
- Reuse an existing loop to call the count function.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
Values are used together, saves one pointer in schedule_node,
reduces amount of reallocations when children count grows.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
Pull run() and schedule_instructions() for fs, and pull a very
simplified version of those into a run() for vec4. Because of the
previous patches the duplication is small.
Since we are touching these, change run() implementations to use the
cfg from the existing reference to the visitor/shader instead of taking
one as argument.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
The list was used for iterating through all instructions and then
later also to track the available ones. Now that the array iteration
is used, change how we fill it and rename it to reflect its only job.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
For all the preparation data collection before the scheduling
actually happens, it is possible to walk the schedule nodes
in order by iterating on the range of the array dedicated to
a given block.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
It is always the same for all nodes, so use the one available in the
scheduler itself.
Also, per Matt's suggestion, collect is_haswell from devinfo instead of
from a function argument.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
The current name doesn't cover all the tex related instructions and
in all usages, we already have a switch statement to dispatch
per instruction type, so is more natural to list the instructions we
care there.
In fs::is_send_from_grf() we can simply ignore them since the
instructions are either lowered directly to SEND (Gfx7+) or use
MRF (Gfx6-).
With this change, the fs_inst::size_read() generated code gets
simplified (the "tex" entries get added to the switch jump table
in gcc) and the default case loses the conditional handling tex.
This reduces shader compilation time, as illustrated by replaying
fossils (tested on my TGL laptop):
```
// Rise of the Tomb Raider (N=13)
Difference at 95.0% confidence
-1.32231 +/- 0.0170138
-4.37605% +/- 0.0563054%
(Student's t, pooled s = 0.0210159)
// Cyberpunk 2077 (N=7)
Difference at 95.0% confidence
-3.64 +/- 0.114993
-2.95188% +/- 0.0932544%
(Student's t, pooled s = 0.09873)
```
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25721>
This is lowered to 32-bit integer execution type by the regioning
lowering pass now, so the existing special casing is redudant for
Gfx12 and buggy for Xe2+, since SEL_EXEC is now emitted without
lowering for 64-bit integers.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>
Extended math instructions are now synchronized as in-order
instructions like other ALU operations, which is more efficient than
the out-of-order tracking we had to do in previous generations, and
avoids false dependencies introduced due to SBID aliasing.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>
Xe2 hardware has a "long" EU pipeline specifically for FP64
instructions, so these are handled as in-order instructions which
require RegDist synchronization. 64-bit integer instructions are now
handled by the normal integer pipeline, so the existing special-casing
inherited from ATS needs to be disabled.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>
This implements the extended 10-bit encoding of the software
scoreboard information used by Xe2 platforms. The new encoding is
different enough that there are few opportunities for sharing code
during translation to machine code, but the high-level tgl_swsb
representation remains roughly the same.
Among other changes the 10-bit SWSB format provides 5 bits worth of
SBID tokens (though they're only usable in large GRF mode) instead of
4 bits, the extended math pipeline is handled as an in-order (RegDist)
pipeline instead of as an out-of-order one, and the dual-argument
encodings support additional combinations of RegDist and SBID
synchronization modes. A new encoding is introduced for preventing
the accumulator hardware scoreboard from being updated, but this is
currently not needed.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>
The workaround applies specifically to Cube and Cube Arrays, so we can
still apply the optimization for the others.
Ideally we would like to pull opt_zero_samples logic into the lowering
sends -- to avoid adding a bit to communicate between passes. However
the texture coordinates for the LOGICAL backend instructions, which
are a common target for the optimization, are combined into offsets over
a single VGRF, so we can't easily identify the constant cases. The
copy-prop pass make this more visible for opt_zero_samples.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>
Inadvertently, because of a sequence of changes elsewhere, this pass
ended up not having any effect:
- Before Gfx5 the optimization is not applicable.
- On Gfx5-6 it doesn't apply because it sampler operations don't
currently use LOAD_PAYLOAD, but write the MOVs directly. Not clear to
me whether they ever did.
- On Gfx7+ it doesn't apply anymore because now the logical sampler
operations are now lowered directly to SENDs, and the is_tex() check
would skip them.
Since the LOAD_PAYLOAD implementation applies for Gfx7+ only, rework the
pass to work again by handling SEND instructions. To make the pass
easier, the optimization will happen before opt_split_sends() so only
one LOAD_PAYLOAD needs to be cared for.
Update the code to accept BAD_FILE sources in addition to zeros, these
are added in some cases as padding and effectively are don't care
values, so we can assume them zeros.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>
This is a preparation to (re-)enable opt_zero_samples(), which will reduce
a SEND mlen before we split it. When that happen, opt_split_sends()
won't be able to rely on the fact that mlen covers the entire
LOAD_PAYLOAD.
Since we are changing that, take the opportunity to also not modify the
existing LOAD_PAYLOAD, just create two new ones with the exact set of
sources. This allows the pass to be further simplified by iterating
forward and not require live_variables analysis.
The helper function was added so can be used later for
opt_zero_samples().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>
For Task/Mesh WorkgroupID is now lowered to WorkgroupIndex by the
generic NIR pass, so we shouldn't hit this. We can now simplify the
asserting code in emit_work_group_id_setup().
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25977>
We use SHADER_OPCODE_SEND directly instead of FS_OPCODE_FB_WRITE (for a
while now) and FS_OPCODE_REP_FB_WRITE (since the previous commit).
Assert that it isn't used on Gfx7+.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
Sandybridge uses this code and needs MRFs, but all other platforms send
from GRFs. Do that directly rather than relying on the MRF hack.
Ivybridge and later also use SHADER_OPCODE_SEND directly rather than
a virtual opcode that's handled in the generator, so we follow suit.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
We never set key->clamp_fragment_color when compiling the BLORP fast
clear shaders. Besides, we were setting saturate on an FB write opcode,
which...isn't even a thing. We would need it on the MOV, and weren't
setting it there. So it wouldn't have even worked.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
This is basically just once through the loop but copy and pasted. One
difference is that the single render target case used a headerless
message, and the multiple render target case always used headers.
Now we use headerless messages for the first render target, even in the
multiple render target case. While we already have it set up for the
other RTs, it's still 2 fewer registers to send. Minor improvement.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
A long time ago, we used a uniform for the clear color. Back in 2014,
we added support for using a flat input instead, as this was easier for
Vulkan, but we left the option of using a uniform for OpenGL.
Eventually nobody used the uniform approach anymore, but the compiler
code to handle it remained. Drop the dead code.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
This code is compiled out, but has been left in place in case we wanted
to use it for debugging something. In the olden days, we'd use it for
platform enabling. I can't think of the last time we did that, though.
I also used to use it for debugging. If something was misrendering, I'd
iterate through shaders 0..N, replacing them with "draw hot pink" until
whatever shader was drawing the bad stuff was brightly illuminated.
Once it was identified, I'd start investigating that shader.
These days, we have frameretrace and renderdoc which are like, actual
tools that let you highlight draws and replace shaders. So we don't
need to resort iterative driver hacks anymore. Again, I can't think of
the last time I actually did that.
So, this code is basically just dead. And it's using legacy MRF paths,
which we could update...or we could just delete it.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
We can end up in situation where we are dispatched with a multisample
framebuffer but not at per-sample. In this case we would request the
at_sample value with the wrong message configuration.
Relying on the BRW_WM_MSAA_FLAG_MULTISAMPLE_FBO flag superseeds
BRW_WM_MSAA_FLAG_PERSAMPLE_DISPATCH.
Fixes piglit tests :
spec@arb_gpu_shader5@arb_gpu_shader5-interpolateatsample*
With Zink on Anv
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 68027bd38e ("intel/fs: implement dynamic interpolation mode for dynamic persample shaders")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25854>
While the fs_visitor::validate() implementation is empty in release
build, we still emit calls to it since it is defined in a separate
compilation unit than its callers. To fix this, just expose an inline
empty function in the header for the release mode.
Fossil run time differences in TGL laptop (difference at 95.0% confidence):
```
Rise of The Tomb Rider (Native) [n=7]
-0.482857 +/- 0.010932
-1.60608% +/- 0.0363621%
Cyberpunk 2077 (DXVK) [n=7]
-0.987143 +/- 0.0904516
-0.82996% +/- 0.076049%
Batman Arkham City (DXVK) [n=7]
-7.74857 +/- 0.329561
-1.46298% +/- 0.0622231%
```
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25847>
There is not real need to use the spirv-linker here at all,
we can just read all the CL C files into one buffer, then compile
that buffer in a single pass.
This worksaround an issue seen with llvm17 and opaque pointers
and the spirv linker.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25759>
First, we need to give the parent_instr field a unique name to be able to
replace with a helper. We have parent_instr fields for both nir_src and
nir_def, so let's rename nir_src::parent_instr in preparation for rework.
This was done with a combination of sed and manual fix-ups.
Then we use semantic patches plus manual fixups:
@@
expression s;
@@
-s->renamed_parent_instr
+nir_src_parent_instr(s)
@@
expression s;
@@
-s.renamed_parent_instr
+nir_src_parent_instr(&s)
@@
expression s;
@@
-s->parent_if
+nir_src_parent_if(s)
@@
expression s;
@@
-s.renamed_parent_if
+nir_src_parent_if(&s)
@@
expression s;
@@
-s->is_if
+nir_src_is_if(s)
@@
expression s;
@@
-s.is_if
+nir_src_is_if(&s)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24671>
While working on cooperative matrix support, I noticed some invalid
DP4A instructions being generated.
dp4a(32) g33<1>UD g21<8,8,1>UD g1.0<0,1,0>UD g9<1,1,1>UD
This violates the constraint that the destination or a source can only
access two consecutive GRFs.
I'm a little surprised that validation didn't catch this. Perhaps
because it's a 3 source instruction? Either way, it seems like a bigger
project to fix that.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: 0f809dbf40 ("intel/compiler: Basic support for DP4A instruction")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25554>
In the default case, there's a special case with a few conditions.
Prefer the cheapest conditions first, so we can take advantage of
short-circuiting.
Effect is a small but still significant reduce in shader compilation
times, as can be seen by:
- Fossil replay for Rise of the Tomb Raider
```
Difference at 95.0% confidence
-0.433333 +/- 0.028609
-1.42556% +/- 0.0941163%
(Student's t, pooled s = 0.0337886)
```
- Fossil replay for Batman Arkham City
```
Difference at 95.0% confidence
-8.84 +/- 0.146083
-1.65932% +/- 0.0274207%
(Student's t, pooled s = 0.125423)
```
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25549>
SIMD1 instructions are problematic because they are considered partial
writes. This increases the liveness of the destination register
written by those instructions. To workaround this we use UNDEF
instructions to bound the liveness of the register. But this causing
other issues like in this case :
undef(1) vgrf2
mov(1) vgrf2, u4.0
add(1) vgrf3, vgrf2.0, 64UD
In this case the copy propagation pass in unable to see that vgrf2 in
the add() instruction can be replaced with the uniform u4.0.
To fix this problem, we switch NoMask SIMD8 instructions that cover
the entire register. We can drop the UNDEF instructions and now copy
propagation can do its job.
Good results on 2 apps :
Cyberpunk 2077 :
Totals from 7258 (68.80% of 10549) affected shaders:
Instrs: 6332210 -> 6073833 (-4.08%); split: -4.11%, +0.03%
Cycles: 130667501 -> 127351268 (-2.54%); split: -3.12%, +0.58%
Subgroup size: 90320 -> 90400 (+0.09%)
Spill count: 90 -> 68 (-24.44%)
Fill count: 82 -> 64 (-21.95%)
Scratch Memory Size: 8192 -> 6144 (-25.00%)
Max live registers: 385464 -> 375152 (-2.68%)
Max dispatch width: 64336 -> 64424 (+0.14%); split: +0.96%, -0.82%
Gaining 60 SIMD16/SIMD32 shaders, loosing 33
Strange Brigade :
Totals from 2137 (53.12% of 4023) affected shaders:
Instrs: 1544031 -> 1457544 (-5.60%); split: -5.60%, +0.00%
Cycles: 22292564 -> 21868978 (-1.90%); split: -2.43%, +0.53%
Subgroup size: 25328 -> 25344 (+0.06%)
Max live registers: 113716 -> 111214 (-2.20%)
Max dispatch width: 17232 -> 18608 (+7.99%); split: +8.36%, -0.37%
Gaining 138 SIMD16/SIMD32 shaders, loosing 4
On app slightly negatively affected :
Dota2 :
Totals from 232 (14.73% of 1575) affected shaders:
Instrs: 30029 -> 28194 (-6.11%)
Cycles: 385155 -> 371422 (-3.57%); split: -3.59%, +0.02%
Max live registers: 6792 -> 6780 (-0.18%)
Max dispatch width: 2256 -> 2160 (-4.26%)
Loosing 6 SIMD32 shaders
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24554>
Some recent NIR changes started generated those instructions. We need
to handle them to be able to rematerialize.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24554>