A long time ago, we used a uniform for the clear color. Back in 2014,
we added support for using a flat input instead, as this was easier for
Vulkan, but we left the option of using a uniform for OpenGL.
Eventually nobody used the uniform approach anymore, but the compiler
code to handle it remained. Drop the dead code.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
This code is compiled out, but has been left in place in case we wanted
to use it for debugging something. In the olden days, we'd use it for
platform enabling. I can't think of the last time we did that, though.
I also used to use it for debugging. If something was misrendering, I'd
iterate through shaders 0..N, replacing them with "draw hot pink" until
whatever shader was drawing the bad stuff was brightly illuminated.
Once it was identified, I'd start investigating that shader.
These days, we have frameretrace and renderdoc which are like, actual
tools that let you highlight draws and replace shaders. So we don't
need to resort iterative driver hacks anymore. Again, I can't think of
the last time I actually did that.
So, this code is basically just dead. And it's using legacy MRF paths,
which we could update...or we could just delete it.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
First, we need to give the parent_instr field a unique name to be able to
replace with a helper. We have parent_instr fields for both nir_src and
nir_def, so let's rename nir_src::parent_instr in preparation for rework.
This was done with a combination of sed and manual fix-ups.
Then we use semantic patches plus manual fixups:
@@
expression s;
@@
-s->renamed_parent_instr
+nir_src_parent_instr(s)
@@
expression s;
@@
-s.renamed_parent_instr
+nir_src_parent_instr(&s)
@@
expression s;
@@
-s->parent_if
+nir_src_parent_if(s)
@@
expression s;
@@
-s.renamed_parent_if
+nir_src_parent_if(&s)
@@
expression s;
@@
-s->is_if
+nir_src_is_if(s)
@@
expression s;
@@
-s.is_if
+nir_src_is_if(&s)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24671>
While working on cooperative matrix support, I noticed some invalid
DP4A instructions being generated.
dp4a(32) g33<1>UD g21<8,8,1>UD g1.0<0,1,0>UD g9<1,1,1>UD
This violates the constraint that the destination or a source can only
access two consecutive GRFs.
I'm a little surprised that validation didn't catch this. Perhaps
because it's a 3 source instruction? Either way, it seems like a bigger
project to fix that.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: 0f809dbf40 ("intel/compiler: Basic support for DP4A instruction")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25554>
In the default case, there's a special case with a few conditions.
Prefer the cheapest conditions first, so we can take advantage of
short-circuiting.
Effect is a small but still significant reduce in shader compilation
times, as can be seen by:
- Fossil replay for Rise of the Tomb Raider
```
Difference at 95.0% confidence
-0.433333 +/- 0.028609
-1.42556% +/- 0.0941163%
(Student's t, pooled s = 0.0337886)
```
- Fossil replay for Batman Arkham City
```
Difference at 95.0% confidence
-8.84 +/- 0.146083
-1.65932% +/- 0.0274207%
(Student's t, pooled s = 0.125423)
```
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25549>
This is what most logical SEND messages do when they take a variable
number of components. 'inst->mlen' is expected to be zero for logical
SEND opcodes, which are expected to behave like plain arithmetic
operations, so certain automated transformations (like SIMD lowering)
can manipulate them without opcode-specific special-casing.
Guessing the number of components from 'inst->mlen' has other
disadvantages, because it requires duplicating the logic that infers
the message payload size in every use of the instruction -- Instead we
can just do the computation once during logical send lowering. In
addition on LNL platform this causes the 'inst->mlen' field of URB
writes to have units inconsistent with every other SEND instruction,
which is likely to lead to confusion and bugs down the road.
Rework:
* Marcin: update emit_urb_indirect_vec4_write
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25195>
Those are used in the failure paths and are easily retriavable from the
stage itself, so no need to store them.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25367>
The position in the error array already indicate the SIMD in question,
so take off all the formatted printing from the errors -- which in some
cases were just not needed. We lose a little bit of extra context but
it is all easily derivable from the message and the SIMD.
This also will remove the overhead when SIMD selection is being used to
just to find the selected dispatch width -- at a point where the shaders
were already compiled -- and the errors are not used at all.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9849
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25336>
Up until now, the mesh pipeline assumed it would be always linked to the
fragment shader, and so the calculated MUE map would always be
available.
That is not the case for fast linked pipeline libraries, so the URB
setup needs to account for this. We do this by replicating what's done
for non-mesh pipelines, defining the URB based on the FS inputs, and
always assuming they will be laid out in order of varying number, except
that we also account for per-primitive attributes.
Fixes all GPL using tests under dEQP-VK.mesh_shader.ext.smoke.*
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25047>
We try various pre-RA scheduler modes and see if any of them allow
us to register allocate without spilling. If all of them spill,
however, we left it on the last mode: LIFO. This is unfortunately
sometimes significantly worse than other modes (such as "none").
This patch makes us instead select the pre-RA scheduling mode that
gives the lowest register pressure estimate, if none of them manage
to avoid spilling. The hope is that this scheduling will spill the
least out of all of them.
fossil-db stats (on Alchemist) speak for themselves:
Totals:
Instrs: 197297092 -> 195326552 (-1.00%); split: -1.02%, +0.03%
Cycles: 14291286956 -> 14303502596 (+0.09%); split: -0.55%, +0.64%
Spill count: 190886 -> 129204 (-32.31%); split: -33.01%, +0.70%
Fill count: 361408 -> 225038 (-37.73%); split: -39.17%, +1.43%
Scratch Memory Size: 12935168 -> 10868736 (-15.98%); split: -16.08%, +0.10%
Totals from 1791 (0.27% of 668386) affected shaders:
Instrs: 7628929 -> 5658389 (-25.83%); split: -26.50%, +0.67%
Cycles: 719326691 -> 731542331 (+1.70%); split: -10.95%, +12.65%
Spill count: 110627 -> 48945 (-55.76%); split: -56.96%, +1.20%
Fill count: 221560 -> 85190 (-61.55%); split: -63.89%, +2.34%
Scratch Memory Size: 4471808 -> 2405376 (-46.21%); split: -46.51%, +0.30%
Improves performance when using XeSS in Cyberpunk 2077 by 90% on A770.
Improves performance of Borderlands 3 by 1.54% on A770.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
This moves a bit of code out of a large function, but also lets us reuse
it a few extra places in the next commit.
I opted to stop using ralloc here since this is short-lived data that
doesn't need to stick around for the rest of the compile, and it's easy
enough to free.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
pre_modes[] is an array with the modes ordered in our desired
preference. scheduler_mode_name[] was also in that order, and the two
had to be kept in sync. This is a little silly; we should just have
a mode enum -> string table and look it up via the enum.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
I'm going to introduce another call site for this function, and just
handling SCHEDULE_NONE in the scheduler itself makes more sense than
duplicating the logic.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
The register pressure analysis I wrote in 2013 only considered VGRFs,
and not other GRFs, such as payload registers and push constants. We
need to consider those too, because payload registers definitely occupy
space and add to pressure.
In 2015, Connor already made the scheduler account for this, so the only
real use for this is in shader statistic dumps and optimizer printouts.
But we should make it more accurate. (We will use it in more places
shortly, a few commits from now.)
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
If the NIR_DEBUG_PRINT_INTERNAL flag is not set, don't print debugging
information for internal shaders in INTEL_DEBUG=optimizer dumps.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24684>
This reverts commit 6b494745be.
The logic is not entirely correct: the comparison is between two
static-analysis estimates of a dynamic system with variables that aren't
captured by the shader source, so using ">" will always have greater potential
to cause regressions whenever the performance difference between the two builds
is something not captured by the static model, no matter how much the model is
improved.
Reference: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9262
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24615>
Instead, we replace every use of it with nir_def. Most of this commit
was generated by sed:
sed -i -e 's/dest.ssa/def/g' src/**/*.h src/**/*.c src/**/*.cpp
A few manual fixups were required in lima and the nir_legacy code.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>
It can be useful to compare 2 runs with different compiler changes.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24552>
One particular nice thing to have is the first generated backend IR
before validation. Especially if you made a mistake in the NIR
translation, you can at least look at it before validation tells you
off.
Then the last 2 steps of the optimize() function can be interesting to
look at.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24552>
And split them into UBO and SSBO
v2 (Lionel):
- Get rid of robustness fields in anv_shader_bin
v3 (Lionel):
- Do not pass unused parameters around
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17545>
Purely from the backend point of view it's just an additional
parameter to sampler messages.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23882>
A few titles show max live register reductions, but nothing
significant in instruction count or other stats.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24282>
This is a modified version of a commit originally in !7698. This version
add the changes to brw_fs_copy_propagation. If the address passed to
fs_visitor::swizzle_nir_scratch_addr is a constant, that function will
generate SHL with two constant sources.
DG2 uses a different path to generate those addresses, so the constant
folding can't occur there yet. That will be addressed in the next
commit.
What follows is the commit change history from that older MR.
v2: Previously this commit was after `intel/fs: Combine constants for
integer instructions too`. However, this commit can create invalid
instructions that are only cleaned up by `intel/fs: Combine constants
for integer instructions too`. That would potentially affect the
shader-db results of each commit, but I did not collect new data for
the reordering.
v3: Fix masking for W/UW and for Q/UQ types. Add an assertion for
!saturate. Both suggested by Ken. Also add an assertion that B/UB types
don't matically come back.
v4: Fix sources count. See also ed3c2f73db ("intel/fs: fixup sources
number from opt_algebraic").
v5: Fix typo in comment added in v3. Noticed by Marcin. Fix a typo in a
comment added when pulling this commit out of !7698. Noticed by Ken.
shader-db results:
DG2
No changes.
Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
total instructions in shared programs: 20655696 -> 20651648 (-0.02%)
instructions in affected programs: 23125 -> 19077 (-17.50%)
helped: 7 / HURT: 0
total cycles in shared programs: 858436639 -> 858407749 (<.01%)
cycles in affected programs: 8990532 -> 8961642 (-0.32%)
helped: 7 / HURT: 0
Broadwell and Haswell had similar results. (Broadwell shown)
total instructions in shared programs: 18500780 -> 18496630 (-0.02%)
instructions in affected programs: 24715 -> 20565 (-16.79%)
helped: 7 / HURT: 0
total cycles in shared programs: 946100660 -> 946087688 (<.01%)
cycles in affected programs: 5838252 -> 5825280 (-0.22%)
helped: 7 / HURT: 0
total spills in shared programs: 17588 -> 17572 (-0.09%)
spills in affected programs: 1206 -> 1190 (-1.33%)
helped: 2 / HURT: 0
total fills in shared programs: 25192 -> 25156 (-0.14%)
fills in affected programs: 156 -> 120 (-23.08%)
helped: 2 / HURT: 0
No shader-db changes on any older Intel platforms.
fossil-db results:
DG2
Totals:
Instrs: 197780415 -> 197780372 (-0.00%); split: -0.00%, +0.00%
Cycles: 14066412266 -> 14066410782 (-0.00%); split: -0.00%, +0.00%
Totals from 16 (0.00% of 668055) affected shaders:
Instrs: 16420 -> 16377 (-0.26%); split: -0.43%, +0.17%
Cycles: 220133 -> 218649 (-0.67%); split: -0.69%, +0.01%
Tiger Lake, Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 153425977 -> 153423678 (-0.00%)
Cycles: 14747928947 -> 14747929547 (+0.00%); split: -0.00%, +0.00%
Subgroup size: 8535968 -> 8535976 (+0.00%)
Send messages: 7697606 -> 7697607 (+0.00%)
Scratch Memory Size: 4380672 -> 4381696 (+0.02%)
Totals from 6 (0.00% of 662749) affected shaders:
Instrs: 13893 -> 11594 (-16.55%)
Cycles: 5386074 -> 5386674 (+0.01%); split: -0.42%, +0.43%
Subgroup size: 80 -> 88 (+10.00%)
Send messages: 675 -> 676 (+0.15%)
Scratch Memory Size: 91136 -> 92160 (+1.12%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23884>
opt_copy_propagation can create invalid instructions like
shl(8) vgrf96:UD, 2d, 8u
These instructions will be cleaned up by opt_algebraic. The irony is
opt_algebraic converts these to simple mov instructions that
opt_copy_propagation should clean up. I don't think we want a loop like
do {
progress = false;
if (OPT(opt_copy_propagation)) {
OPT(opt_algebraic);
OPT(dead_code_eliminate);
}
} while (progress);
But maybe we do?
Maybe this would be sufficient:
while (OPT(opt_copy_propagation))
OPT(opt_algebraic);
OPT(dead_code_eliminate);
No shader-db or fossil-db changes (yet) on any Intel platform. This is
expected.
v2: Do opt_algebraic immediately after every call to
opt_copy_propagation instead of being clever. Suggested by Lionel.
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23884>
Both per-primitive and per-vertex space is allocated in MUE in 8 dword
chunks and those 8-dword chunks (granularity of
3DSTATE_SBE_MESH.Per[Primitive|Vertex]URBEntryOutputReadLength)
are passed to fragment shaders as inputs (either non-interpolated
for per-primitive and flat vertex attributes or interpolated
for non-flat vertex attributes).
Some attributes have a special meaning and must be placed in separate
8/16-dword slot called Primitive Header or Vertex Header.
Primitive Header contains 4 such attributes (Cull Primitive,
ViewportIndex, RTAIndex, CPS), leaving 4 dwords (the rest of 8-dword
slot) potentially unused.
Vertex Header is similar - it starts with 3 unused dwords, 1 dword for
Point Size (but if we declare that shader doesn't produce Point Size
then we can reuse it), followed by 4 dwords for Position and optionally
8 dwords for clip distances.
This means we have an interesting optimization problem - we can put
some user attributes into holes in Primitive and Vertex Headers, which
may lead to smaller MUE size and potentially more mesh threads running
in parallel, but we have to be careful to use those holes only when
we need it, otherwise we could force HW to pass too much data to
fragment shader.
Example 1:
Let's assume that Primitive Header is enabled and user defined
12 dwords of per-primitive attributes.
Without packing we would consume 8 + ALIGN(12, 8) = 24 dwords of
MUE space and pass ALIGN(12, 8) = 16 dwords to fragment shader.
With packing, we'll consume 4 + 4 + ALIGN(12 - 4, 8) = 16 dwords of
MUE space and pass ALIGN(4, 8) + ALIGN(12 - 4, 8) = 16 dwords to
fragment shader.
16/16 is better than 24/16, so packing makes sense.
Example 2:
Now let's assume that Primitive Header is enabled and user defined
16 dwords of per-primitive attributes.
Without packing we would consume 8 + ALIGN(16, 8) = 24 dwords of
MUE space and pass ALIGN(16, 16) = 16 dwords to fragment shader.
With packing, we'll consume 4 + 4 + ALIGN(16 - 4, 8) = 24 dwords of
MUE space and pass ALIGN(4, 8) + ALIGN(16 - 4, 8) = 24 dwords to
fragment shader.
24/24 is worse than 24/16, so packing doesn't make sense.
This change doesn't affect vk_meshlet_cadscene in default configuration,
but it speeds it up by up to 25% with "-extraattributes N", where
N is some small value divisible by 2 (by default N == 1) and we
are bound by URB size.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>
Instead of using 4 dwords for each output slot, use only the amount
of memory actually needed by each variable.
There are some complications from this "obvious" idea:
- flat and non-flat variables can't be merged into the same vec4 slot,
because flat inputs mask has vec4 stride
- multi-slot variables can have different layout:
float[N] requires N 1-dword slots, but
i64vec3 requires 1 fully occupied 4-dword slot followed by 2-dword slot
- some output variables occur both in single-channel/component split
and combined variants
- crossing vec4 boundary requires generating more writes, so avoiding them
if possible is beneficial
This patch fixes some issues with arrays in per-vertex and per-primitive data
(func.mesh.ext.outputs.*.indirect_array.q0 in crucible)
and by reduction in single MUE size it allows spawning more threads at
the same time.
Note: this patch doesn't improve vk_meshlet_cadscene performance because
default layout is already optimal enough.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>
Use a struct for various common parameters rather than per stage
structure or arguments to stage specific entrypoints.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23942>