When validation fails we print instructions to use INTEL_DEBUG=shaders
but that will not help if we assert before dumping shader debug log.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40529>
vk_clock_gettime hasn't been used by other implementations ever since
venus and kk migrated over to the common implementation. It'd be better
to drop that helper (or move into anv) because it's not OS agnostic as
compare to the more comprehensive vk_device_get_timestamp.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40582>
One frustrating thing about the CMP and CMPN instructions is that they
always write the flags. Sometimes, however, it is desirable to generate
the comparison result without modifying the flags. This would,
theoretically, reduce false dependencies that restrict the scheduler's
ability to rearrange code, create more opportunities for cmod
propagation, save a kitten from a tree, and make a rainbow.
Consider this sequence:
cmp.ge.f0.0(8) g103<1>F g101<8,8,1>F g39<8,8,1>F
cmp.nz.f0.0(8) null<1>D g81<8,8,1>D 0D
(+f0.0) if(8) JIP: LABEL19 UIP: LABEL19
It would be advantageous to put the first CMP between the second CMP and
the IF, but this cannot be done since the IF depends on the flags generated
by the second CMP.
This pass enables this rescheduling by changing the first CMP to write
to a different flags register.
cmp.ge.f1.0(8) g103<1>F g101<8,8,1>F g39<8,8,1>F
cmp.nz.f0.0(8) null<1>D g81<8,8,1>D 0D
(+f0.0) if(8) JIP: LABEL19 UIP: LABEL19
Sometimes this is also possible by using a different instruction. For
example, consider
cmp.l.f0.0(8) g103<1>D g101<8,8,1>D 0D
This produces 0xffffffff when g101 negative and zero otherwise. This
instruction, which does not modifiy the flag, also produces these results:
asr(8) g103<1>D g101<8,8,1>D 31D
Gfx9 platforms take a hit on instructions due to the instruction added
at the end of short shaders by brw_workaround_source_arf_before_eot.
shader-db:
Lunar Lake, Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Lunar Lake shown)
total instructions in shared programs: 17089451 -> 17088766 (<.01%)
instructions in affected programs: 766613 -> 765928 (-0.09%)
helped: 653 / HURT: 0
total cycles in shared programs: 888832986 -> 887873068 (-0.11%)
cycles in affected programs: 549441852 -> 548481934 (-0.17%)
helped: 10474 / HURT: 130
LOST: 9
GAINED: 0
Skylake
total instructions in shared programs: 19037976 -> 19049719 (0.06%)
instructions in affected programs: 3979914 -> 3991657 (0.30%)
helped: 503 / HURT: 12303
total cycles in shared programs: 867918242 -> 866930801 (-0.11%)
cycles in affected programs: 512773919 -> 511786478 (-0.19%)
helped: 13858 / HURT: 66
LOST: 32
GAINED: 0
fossil-db:
Lunar Lake
Totals:
Instrs: 925023504 -> 924950382 (-0.01%); split: -0.01%, +0.00%
Cycle count: 106348432916 -> 106116809009 (-0.22%); split: -0.22%, +0.00%
Spill count: 3423988 -> 3423930 (-0.00%); split: -0.00%, +0.00%
Fill count: 4877087 -> 4876960 (-0.00%); split: -0.01%, +0.00%
Max dispatch width: 49087552 -> 49078448 (-0.02%); split: +0.00%, -0.02%
Totals from 1099332 (54.44% of 2019443) affected shaders:
Instrs: 742670473 -> 742597351 (-0.01%); split: -0.01%, +0.00%
Cycle count: 100455549635 -> 100223925728 (-0.23%); split: -0.23%, +0.00%
Spill count: 3384366 -> 3384308 (-0.00%); split: -0.00%, +0.00%
Fill count: 4837434 -> 4837307 (-0.00%); split: -0.01%, +0.00%
Max dispatch width: 26725152 -> 26716048 (-0.03%); split: +0.00%, -0.03%
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 997603774 -> 997529238 (-0.01%); split: -0.01%, +0.00%
Cycle count: 93904012762 -> 93646730006 (-0.27%); split: -0.28%, +0.00%
Spill count: 3710155 -> 3710125 (-0.00%); split: -0.00%, +0.00%
Fill count: 5032908 -> 5032819 (-0.00%); split: -0.01%, +0.00%
Max dispatch width: 37929640 -> 37811560 (-0.31%)
Totals from 1334920 (58.52% of 2281134) affected shaders:
Instrs: 817377787 -> 817303251 (-0.01%); split: -0.01%, +0.00%
Cycle count: 88468851658 -> 88211568902 (-0.29%); split: -0.29%, +0.00%
Spill count: 3663353 -> 3663323 (-0.00%); split: -0.00%, +0.00%
Fill count: 4991629 -> 4991540 (-0.00%); split: -0.01%, +0.00%
Max dispatch width: 20245832 -> 20127752 (-0.58%)
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 1013433769 -> 1013363273 (-0.01%); split: -0.01%, +0.00%
Cycle count: 85766921182 -> 85509316620 (-0.30%); split: -0.31%, +0.00%
Spill count: 3903923 -> 3903944 (+0.00%); split: -0.00%, +0.00%
Fill count: 6801983 -> 6801948 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 37896320 -> 37805320 (-0.24%); split: +0.00%, -0.24%
Totals from 1333814 (58.54% of 2278396) affected shaders:
Instrs: 830200531 -> 830130035 (-0.01%); split: -0.01%, +0.00%
Cycle count: 80746184101 -> 80488579539 (-0.32%); split: -0.32%, +0.01%
Spill count: 3855771 -> 3855792 (+0.00%); split: -0.00%, +0.00%
Fill count: 6755513 -> 6755478 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 20301456 -> 20210456 (-0.45%); split: +0.00%, -0.45%
Skylake
Totals:
Instrs: 519389758 -> 519874108 (+0.09%); split: -0.00%, +0.10%
Cycle count: 57932316132 -> 57789433956 (-0.25%); split: -0.25%, +0.00%
Spill count: 636741 -> 636715 (-0.00%); split: -0.01%, +0.00%
Fill count: 860470 -> 860357 (-0.01%); split: -0.02%, +0.00%
Max dispatch width: 32527800 -> 32481792 (-0.14%); split: +0.00%, -0.14%
Totals from 1080380 (62.25% of 1735462) affected shaders:
Instrs: 411976399 -> 412460749 (+0.12%); split: -0.00%, +0.12%
Cycle count: 54291447615 -> 54148565439 (-0.26%); split: -0.27%, +0.00%
Spill count: 602993 -> 602967 (-0.00%); split: -0.01%, +0.00%
Fill count: 734459 -> 734346 (-0.02%); split: -0.02%, +0.00%
Max dispatch width: 18626096 -> 18580088 (-0.25%); split: +0.00%, -0.25%
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38978>
The Bspec says that SEL sources and destination can be any mix of *B,
*W, and *D. We should allow those. Specifically, without this change,
this instruction
sel.sat.l(8) v548:UD, v899:D, 255d
gets unnecessarily split into two instructions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38978>
Prevents assertion failures in func.shader-ballot.basic.q0 and other
tests starting with "nir/algebraic: Optimize some b2f of integer
comparison".
Vector immediates, bfloat, and 8-bit floats are still not supported.
v2: Almost complete re-write based on suggestions from Ken.
v3: Don't retype() on a brw_imm_f value.
Fixes: f8e54d02f7 ("intel/compiler: Relax mixed type restriction for saturating immediates")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38978>
Simulator is crashing when receiving GPGPU + Pixel as resource barrier signal
stage, what according to spec is invalid.
So here replacing the pixel stage by color, over synchronizing it a bit but
keeping it functional.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14641
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40516>
Simulator hangs if a resource barrier has wait stage = None, HW seens
to don't care but something bad could be happning internaly.
So here making sure Wait stage is set to TOP when it is None.
Simulator hangs if a resource barrier has wait stage = None.
The HW seems to ignore it, but something bad could be happening internally.
So here I'm making sure the wait stage is set to TOP when it is None.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40516>
_mesa_sha1_format has a few remaining uses, so it's moved to build_id.c,
which is its last user.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>
Dual source blending when one of the sources is not written to leaves
those values undefined, but the other should still be valid.
By omitting unwritten outputs, we ended up not writing anything at all
for the case that OUT1 is written to but OUT0 is undefined.
Fixes new CTS tests: dEQP-VK.pipeline.*.blend.dual_source.undefined_output.first*
Cc: mesa-stable
Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40357>
When there is no trace pointer, there is usually a another tracepoint
being emitted (see STATE_BASE_ADDRESS,
3DSTATE_BINDING_TABLE_POOL_ALLOC emission).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40503>
In commit 10b5b279a4 ("anv: Fix CmdResetEvent2() with RESOURCE_BARRIER::Wait stage == none")
I haved added assert to catch invalid cases but looks like we have several tests
affected by that problem causing crashes in debug builds.
So here I'm removing those asserts(), will then work on all the fixes and bring
it back.
Acked-by: Ivan Briano <ivan.briano@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40476>
On integrated platforms, we have issue where L3 cache not being coherent
with CS and it forces us to push data out L3.
To avoid data cache flush, let's write the IR header with BLORP shader.
There is a small shader launch latency but eventually that should not
matter because writing data with CS (MI_STORE) commands is slower than
shader execution when we consider large number of BVH tree getting
built.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39971>
Add dedicated BLORP op enums so clear paths can be represented
precisely.
This is enum-only groundwork; behavior and trace output are wired in
follow-up commits.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40414>
The runtime builds a final pipeline state with pointers to structures
coming from the associated pipelines libraries.
So far it has considered that the viewMask was part of a structure
together with the rest of the renderpass information. This information
can be specified in pre-raster, fragment & color-output state groups
and it was assumed would be consistent for all 3. And the runtime
currently takes the pointer to the structure from the last pipeline
library (color output).
Some coming spec/cts will clarify that the viewMask only needs to be
specified for pre-raster & fragment groups, making the value in the
color-output group untrustworthy.
This change creates a new state structure to hold the viewMask on its
own so it is only gather on pre-raster & fragment groups.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (radv)
Reviewed-by: Aitor Camacho <aitor@lunarg.com> (kosmickrisp)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (turnip)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v3dv)
Reviewed-by: Frank Binns <frank.binns@imgtec.com> (powervr)
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com> (panvk)
Royaled-yes-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> (lavapipe)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39940>
CmdResetEvent2() was calling anv_add_pending_pipe_bits() with no dst_stages
stages causing RESOURCE_BARRIER::Wait stage == none, what causes a GPU hang in
NVL-P simulator.
So here setting dst_stages to VK_PIPELINE_STAGE_2_TOP_OF_PIPE_BIT and adding
an assert in resource_barrier_wait_stage() to catch hw_stage == 0.
This fixes crucible func.event.cmd_buffer.q0 in simulator.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40445>
RADV wants to abstract the compiler from any instance/device/pdev
objects.
The previous NULL check for instance seems to be useless.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40379>
Those values trace back to 2015, pre Vulkan 1.0 release. I have no
idea why it was set to this, except maybe the HALIGN_128 of
RENDER_SURFACE_STATE.
Anyway, discussing this with Nanley, we don't think 128bytes is more
optimal than 64bytes. Nanley suggested the lowest value could be
16bytes for the fixed functions inside the GPU (sampler, dataport),
but a cacheline probably makes more sense for the memory interface.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40363>