The current tracking seems to have hidden issues related to MCS
ambiguate that are currently hidden by the fact that we're inserting
pb-stall+RT-flush on BTI changes which we're going to be remove in the
next commits.
The issues appear to be related to a missing pb-stall+RT-flush between
MCS ambiguate and fast-clear causing failures on the following tests
once BTP+BTI RCC caching is enabled :
dEQP-VK.pipeline.*.multisample.misc.*multi*
dEQP-VK.pipeline.*.framebuffer_attachment.diff_attachments_2d_32x32_39x41_ms
dEQP-VK.pipeline.*.framebuffer_attachment.diff_attachments_2d_32x32_48x48_ms
Here we rework the tracking with a new enum to track 3 classes of
operations.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39982>
Instead of :
foreach color attachment
transition layout
fast clear
slow clear
do this :
foreach color attachment
transition layout
foreach color attachment
fast clear
foreach color attachment
slow clear
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39982>
When rendering only has depth/stencil, we need to look at the
depth/stencil view size to generate a dummy null color attachments. So
do that first, so we don't have to iterate color attachments once more
with the final size.
This change also has the nice impact of removing a BTI change flush
due to the sequence moving from :
- before blorp BTI-flush
- color fast-clear
- after blorp BTI-flush
- depth fast-clear
- change RT due to shader outputs (BTI-flush)
- draw call
to :
- depth fast-clear
- before blorp BTI-flush
- color fast-clear
- combined after blorp BTI-flush (pending)
- change RT due to shader outputs (BTI-flush, combined with above)
- draw call
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39982>
This is going to bite us a lot more when RCC BTP+BTI is enabled.
In particular this test will hang pretty reliably on LNL :
dEQP-VK.renderpasses.dynamic_rendering.primary_cmd_buff.suballocation.multisample_resolve.layers_3.r32g32_sfloat.samples_4_baseLayer1
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: f66ff97d58 ("drirc/anv: implement steps to disable RHWO for Wa_14024015672")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39982>
textureGather() returns the four taps that would have been filtered
together to produce the value that ordinary texturing operations
would return. As such, it should access the same data, so we can use
either interchangeably when we're only checking for residency and not
returning the actual data.
This allows us to mask out some unneeded registers.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40590>
This splits a single texture-with-residency operation into two halves,
one which returns texture data, and another which queries residency.
We're currently using this only for a shadow sampling workaround,
but the technique is more broadly applicable, if we ever wanted.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40590>
When validation fails we print instructions to use INTEL_DEBUG=shaders
but that will not help if we assert before dumping shader debug log.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40529>
vk_clock_gettime hasn't been used by other implementations ever since
venus and kk migrated over to the common implementation. It'd be better
to drop that helper (or move into anv) because it's not OS agnostic as
compare to the more comprehensive vk_device_get_timestamp.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40582>
One frustrating thing about the CMP and CMPN instructions is that they
always write the flags. Sometimes, however, it is desirable to generate
the comparison result without modifying the flags. This would,
theoretically, reduce false dependencies that restrict the scheduler's
ability to rearrange code, create more opportunities for cmod
propagation, save a kitten from a tree, and make a rainbow.
Consider this sequence:
cmp.ge.f0.0(8) g103<1>F g101<8,8,1>F g39<8,8,1>F
cmp.nz.f0.0(8) null<1>D g81<8,8,1>D 0D
(+f0.0) if(8) JIP: LABEL19 UIP: LABEL19
It would be advantageous to put the first CMP between the second CMP and
the IF, but this cannot be done since the IF depends on the flags generated
by the second CMP.
This pass enables this rescheduling by changing the first CMP to write
to a different flags register.
cmp.ge.f1.0(8) g103<1>F g101<8,8,1>F g39<8,8,1>F
cmp.nz.f0.0(8) null<1>D g81<8,8,1>D 0D
(+f0.0) if(8) JIP: LABEL19 UIP: LABEL19
Sometimes this is also possible by using a different instruction. For
example, consider
cmp.l.f0.0(8) g103<1>D g101<8,8,1>D 0D
This produces 0xffffffff when g101 negative and zero otherwise. This
instruction, which does not modifiy the flag, also produces these results:
asr(8) g103<1>D g101<8,8,1>D 31D
Gfx9 platforms take a hit on instructions due to the instruction added
at the end of short shaders by brw_workaround_source_arf_before_eot.
shader-db:
Lunar Lake, Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Lunar Lake shown)
total instructions in shared programs: 17089451 -> 17088766 (<.01%)
instructions in affected programs: 766613 -> 765928 (-0.09%)
helped: 653 / HURT: 0
total cycles in shared programs: 888832986 -> 887873068 (-0.11%)
cycles in affected programs: 549441852 -> 548481934 (-0.17%)
helped: 10474 / HURT: 130
LOST: 9
GAINED: 0
Skylake
total instructions in shared programs: 19037976 -> 19049719 (0.06%)
instructions in affected programs: 3979914 -> 3991657 (0.30%)
helped: 503 / HURT: 12303
total cycles in shared programs: 867918242 -> 866930801 (-0.11%)
cycles in affected programs: 512773919 -> 511786478 (-0.19%)
helped: 13858 / HURT: 66
LOST: 32
GAINED: 0
fossil-db:
Lunar Lake
Totals:
Instrs: 925023504 -> 924950382 (-0.01%); split: -0.01%, +0.00%
Cycle count: 106348432916 -> 106116809009 (-0.22%); split: -0.22%, +0.00%
Spill count: 3423988 -> 3423930 (-0.00%); split: -0.00%, +0.00%
Fill count: 4877087 -> 4876960 (-0.00%); split: -0.01%, +0.00%
Max dispatch width: 49087552 -> 49078448 (-0.02%); split: +0.00%, -0.02%
Totals from 1099332 (54.44% of 2019443) affected shaders:
Instrs: 742670473 -> 742597351 (-0.01%); split: -0.01%, +0.00%
Cycle count: 100455549635 -> 100223925728 (-0.23%); split: -0.23%, +0.00%
Spill count: 3384366 -> 3384308 (-0.00%); split: -0.00%, +0.00%
Fill count: 4837434 -> 4837307 (-0.00%); split: -0.01%, +0.00%
Max dispatch width: 26725152 -> 26716048 (-0.03%); split: +0.00%, -0.03%
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 997603774 -> 997529238 (-0.01%); split: -0.01%, +0.00%
Cycle count: 93904012762 -> 93646730006 (-0.27%); split: -0.28%, +0.00%
Spill count: 3710155 -> 3710125 (-0.00%); split: -0.00%, +0.00%
Fill count: 5032908 -> 5032819 (-0.00%); split: -0.01%, +0.00%
Max dispatch width: 37929640 -> 37811560 (-0.31%)
Totals from 1334920 (58.52% of 2281134) affected shaders:
Instrs: 817377787 -> 817303251 (-0.01%); split: -0.01%, +0.00%
Cycle count: 88468851658 -> 88211568902 (-0.29%); split: -0.29%, +0.00%
Spill count: 3663353 -> 3663323 (-0.00%); split: -0.00%, +0.00%
Fill count: 4991629 -> 4991540 (-0.00%); split: -0.01%, +0.00%
Max dispatch width: 20245832 -> 20127752 (-0.58%)
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 1013433769 -> 1013363273 (-0.01%); split: -0.01%, +0.00%
Cycle count: 85766921182 -> 85509316620 (-0.30%); split: -0.31%, +0.00%
Spill count: 3903923 -> 3903944 (+0.00%); split: -0.00%, +0.00%
Fill count: 6801983 -> 6801948 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 37896320 -> 37805320 (-0.24%); split: +0.00%, -0.24%
Totals from 1333814 (58.54% of 2278396) affected shaders:
Instrs: 830200531 -> 830130035 (-0.01%); split: -0.01%, +0.00%
Cycle count: 80746184101 -> 80488579539 (-0.32%); split: -0.32%, +0.01%
Spill count: 3855771 -> 3855792 (+0.00%); split: -0.00%, +0.00%
Fill count: 6755513 -> 6755478 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 20301456 -> 20210456 (-0.45%); split: +0.00%, -0.45%
Skylake
Totals:
Instrs: 519389758 -> 519874108 (+0.09%); split: -0.00%, +0.10%
Cycle count: 57932316132 -> 57789433956 (-0.25%); split: -0.25%, +0.00%
Spill count: 636741 -> 636715 (-0.00%); split: -0.01%, +0.00%
Fill count: 860470 -> 860357 (-0.01%); split: -0.02%, +0.00%
Max dispatch width: 32527800 -> 32481792 (-0.14%); split: +0.00%, -0.14%
Totals from 1080380 (62.25% of 1735462) affected shaders:
Instrs: 411976399 -> 412460749 (+0.12%); split: -0.00%, +0.12%
Cycle count: 54291447615 -> 54148565439 (-0.26%); split: -0.27%, +0.00%
Spill count: 602993 -> 602967 (-0.00%); split: -0.01%, +0.00%
Fill count: 734459 -> 734346 (-0.02%); split: -0.02%, +0.00%
Max dispatch width: 18626096 -> 18580088 (-0.25%); split: +0.00%, -0.25%
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38978>
The Bspec says that SEL sources and destination can be any mix of *B,
*W, and *D. We should allow those. Specifically, without this change,
this instruction
sel.sat.l(8) v548:UD, v899:D, 255d
gets unnecessarily split into two instructions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38978>
Prevents assertion failures in func.shader-ballot.basic.q0 and other
tests starting with "nir/algebraic: Optimize some b2f of integer
comparison".
Vector immediates, bfloat, and 8-bit floats are still not supported.
v2: Almost complete re-write based on suggestions from Ken.
v3: Don't retype() on a brw_imm_f value.
Fixes: f8e54d02f7 ("intel/compiler: Relax mixed type restriction for saturating immediates")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38978>
Simulator is crashing when receiving GPGPU + Pixel as resource barrier signal
stage, what according to spec is invalid.
So here replacing the pixel stage by color, over synchronizing it a bit but
keeping it functional.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14641
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40516>
Simulator hangs if a resource barrier has wait stage = None, HW seens
to don't care but something bad could be happning internaly.
So here making sure Wait stage is set to TOP when it is None.
Simulator hangs if a resource barrier has wait stage = None.
The HW seems to ignore it, but something bad could be happening internally.
So here I'm making sure the wait stage is set to TOP when it is None.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40516>
_mesa_sha1_format has a few remaining uses, so it's moved to build_id.c,
which is its last user.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>
Dual source blending when one of the sources is not written to leaves
those values undefined, but the other should still be valid.
By omitting unwritten outputs, we ended up not writing anything at all
for the case that OUT1 is written to but OUT0 is undefined.
Fixes new CTS tests: dEQP-VK.pipeline.*.blend.dual_source.undefined_output.first*
Cc: mesa-stable
Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40357>
When there is no trace pointer, there is usually a another tracepoint
being emitted (see STATE_BASE_ADDRESS,
3DSTATE_BINDING_TABLE_POOL_ALLOC emission).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40503>
In commit 10b5b279a4 ("anv: Fix CmdResetEvent2() with RESOURCE_BARRIER::Wait stage == none")
I haved added assert to catch invalid cases but looks like we have several tests
affected by that problem causing crashes in debug builds.
So here I'm removing those asserts(), will then work on all the fixes and bring
it back.
Acked-by: Ivan Briano <ivan.briano@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40476>
On integrated platforms, we have issue where L3 cache not being coherent
with CS and it forces us to push data out L3.
To avoid data cache flush, let's write the IR header with BLORP shader.
There is a small shader launch latency but eventually that should not
matter because writing data with CS (MI_STORE) commands is slower than
shader execution when we consider large number of BVH tree getting
built.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39971>
Add dedicated BLORP op enums so clear paths can be represented
precisely.
This is enum-only groundwork; behavior and trace output are wired in
follow-up commits.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40414>
The runtime builds a final pipeline state with pointers to structures
coming from the associated pipelines libraries.
So far it has considered that the viewMask was part of a structure
together with the rest of the renderpass information. This information
can be specified in pre-raster, fragment & color-output state groups
and it was assumed would be consistent for all 3. And the runtime
currently takes the pointer to the structure from the last pipeline
library (color output).
Some coming spec/cts will clarify that the viewMask only needs to be
specified for pre-raster & fragment groups, making the value in the
color-output group untrustworthy.
This change creates a new state structure to hold the viewMask on its
own so it is only gather on pre-raster & fragment groups.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (radv)
Reviewed-by: Aitor Camacho <aitor@lunarg.com> (kosmickrisp)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (turnip)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v3dv)
Reviewed-by: Frank Binns <frank.binns@imgtec.com> (powervr)
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com> (panvk)
Royaled-yes-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> (lavapipe)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39940>