Makes the intention of some comparisons clearer by using the named
helper functions. Add commentary when the straightforward range is not
the one used, e.g. VGRF interference.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>
VUID-VkDescriptorSetAllocateInfo-pSetLayouts-09380 says that :
"If pSetLayouts[i] was created with an element of pBindingFlags
that includes VK_DESCRIPTOR_BINDING_VARIABLE_DESCRIPTOR_COUNT_BIT,
and VkDescriptorSetVariableDescriptorCountAllocateInfo is included
in the pNext chain, and
VkDescriptorSetVariableDescriptorCountAllocateInfo::descriptorSetCount
is not zero, then
VkDescriptorSetVariableDescriptorCountAllocateInfo::pDescriptorCounts[i]
must be less than or equal to
VkDescriptorSetLayoutBinding::descriptorCount for the corresponding
binding used to create pSetLayouts[i]"
But applications like are not following the spec. RADV doesn't apply
that limit and allocates if there is enough space in the pool. Let's
just do the same.
Note that this issue got resolved with a vkd3d-proton change :
a7ac1a7d2f
But since this change is deleting more code than it adds, might as
well go with it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12185
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32305>
Created stand alone tool for sampling gfx data on regular
intervals. Tool has inner loop that performs sampling every N
useconds. Press any key to end sampling. Results will be dumped
when intel_monitor exits.
First application of intel_monitor will be to collect eu stall
data. Perhaps more applications can be added at a later date.
How to use:
0. Set sysctl dev.xe.observation_paranoid=0
1. Clean shader cache and launch gfx INTEL_DEBUG=shaders-lineno.
Redirect stderr to asm.txt.
2. When gfx app ready to monitor, begin capturing eustall data by
launching `intel_monitor -e > eustall.csv` in separate console.
3 When done collected, close intel_monitor by pressing any key.
4. Correlate eustall data in eustall.csv with shader instructions in
asm.txt by matching instruction offsets. Use data to determine which
instructions are stalling and why.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30142>
Xe2+ GPUs have support for eu stall sampling perf debug feature.
This feature allows driver to collect count and reasons for why
EUs are stalled on GPU. Stall data is cross referenced with ip
address within individual shaders so it is possible to know which
instructions in which shaders are generating stalls. This should
be a very useful feature for debugging performance of slow shaders.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30142>
Add support for dumping shader asm containing instruction line numbers
matching offsets within instruction state pool buffer. Offsets
should match values collected from eu stall sampling. This is
required for match eu stall data with individual shader instructions.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30142>
A few Android tools are based on/assume the datasource names
gpu.renderstages and gpu.counters. It is less effort to align with that
naming for Android builds than to chase down those tools and fix them,
not to mention account for new tools that may be created in the future.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34330>
Expand constant sources to cover the region read by DPAS, and also
use NULL register as accumulator when possible.
Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34373>
Funny story... this is how regular b2f was implemented before Curro
implmented the `MOV dst:F -src:D` method 9 years ago (see
3ee2daf23d).
Eliminating the type conversion in the arithmetic operation enables the
next commit.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>
This commit prevents code quality regressions in the next
commit. Without this, some fragment shaders in Batman: Arkham Origins
have code like:
shr(8) g51<1>UW g1.28<1,8,0>UB 0x76543210V
...
and(8) g52<1>UD ~g51<8,8,1>UW 0x0001UW
...
add(8) g56<1>D -g52<8,8,1>D 1D
transformed to
shr(8) g51<1>UW g1.28<1,8,0>UB 0x76543210V
...
and(8) g52<1>UD ~g51<8,8,1>UW 0x0001UW
...
mov(8) g56<1>D -g52<8,8,1>D
...
and(8) g57<1>UD ~g56<8,8,1>D 0x00000001UD
Propagating through the negation allows the added MOV to be deleted.
shader-db:
All Intel platforms had simlar results. (Lunar Lake shown)
total instructions in shared programs: 16968020 -> 16968019 (<.01%)
instructions in affected programs: 281 -> 280 (-0.36%)
helped: 1 / HURT: 0
total cycles in shared programs: 914598850 -> 914598832 (<.01%)
cycles in affected programs: 5398 -> 5380 (-0.33%)
helped: 1 / HURT: 0
A single Blender vertex shader was affected.
fossil-db:
Lunar Lake, Tiger Lake, Ice Lake, and Skylake had similar results. (Lunar Lake shown)
Totals:
Instrs: 209894650 -> 209894651 (+0.00%)
Cycle count: 30545958586 -> 30545952860 (-0.00%)
Totals from 2 (0.00% of 706657) affected shaders:
Instrs: 3582 -> 3583 (+0.03%)
Cycle count: 1875100 -> 1869374 (-0.31%)
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Subgroup size: 9906400 -> 9906416 (+0.00%)
Totals from 2 (0.00% of 805770) affected shaders:
Subgroup size: 16 -> 32 (+100.00%)
Two compute shaders in Hogwarts Legacy were affected.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>
This is mostly defensive. If a convergent value ever ended up as a
source of a DDX or DDY, the eu_emit code will ignore the stride. This
will result in bad code being generated.
No shader-db or fossil-db changes on any Intel platform.
v2: DDX and DDY will always be float, but brw_imm_for_type only works
with integer types.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Suggested-by: Ken
Fixes: d5d7ae22ae ("brw/nir: Fix up handling of sources that might be convergent vectors")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33007>
The vast majority of the callers want channel = 0. During the
development process, using this default parameter value saved a lot of
pain in rebasing. However, it seems to be more trouble than it's worth.
Issue #12464 occurred because LNL was merged while this code was in
review. As a result, one caller of get_nir_src that wanted channel = -1
was not inspected closely, and it got the default channel = 0 instead.
To prevent this happening in the future (with possible branches still
yet to be merged, for example), remove the default parameter. This will
force the inspection of any callers that don't have an explicit channel
parameter. Hopefully that will prevent more problems.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33007>
The source of nir_intrinsic_load_barycentric_at_offset is a vector, so
-1 should be passed to get_nir_src. This is also done for texture
sampling intrinsics.
I skimmed the other user of get_nir_src, and I believe they are
correct. This one was just missed as LNL support landed an many, many
rebases of the original MR occurred.
v2: Fix another get_nir_src call. Suggested by Lionel.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: d5d7ae22ae ("brw/nir: Fix up handling of sources that might be convergent vectors")
Closes: #12464
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33007>
Ensure proper cleanup when memory allocation fails during HWCTX and VMA
parsing in `read_xe_data_file`. This ensures graceful error handling by
preventing potential memory leaks.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34371>
Some tests changed to avoid unintended overlap between operands which
would change the SWSB assigned. In some cases also changed the Gfx12
matching test so they remain equal.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34354>
Also update WHILE to optionally take a predicate (default to NONE). And
make the predicate in the IF optional (default to NORMAL).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34354>
The intervening_saturating_copy test is removed. The defs version of the
pass does not handle this case. It should not occur often in practice
anyway. Copy propagation and brw_nir_opt_fsat should prevent this
scenario from happening.
No shader-db changes on any Intel platform.
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 212677275 -> 212677278 (+0.00%)
Cycle count: 30466062848 -> 30466056040 (-0.00%)
Totals from 1 (0.00% of 706300) affected shaders:
Instrs: 1343 -> 1346 (+0.22%)
Cycle count: 411664 -> 404856 (-1.65%)
v2: Stop counting ip. The non-defs part of the pass was the only thing
that used it.
v3: Also delete "if (block != def->block) continue;" code. I noticed
this while working on some other changes to this function. It's the last
thing in the loop, so it's totally useless. Delete some other spurious
continues too.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v2]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
v2: Add support for WE_all instructions... this already just worked, so
I only had to delete the check and the FINISHME comment.
v3: Use logic more like def_analysis::update_for_reads to determine when
to not insert LOAD_REG instructions. Based on a suggestion by Ken.
v4: Eliminate "store" from all the names since STORE_REG does not exist
anymore. Fold insert_load_reg into brw_insert_load_reg. Elminate extra
call to s.def_analysis.require() after progress. Pull a loop-invariant
check out of the inst->srouces loop. Drop call to
brw_opt_split_virtual_grfs after lowering load_reg. All suggested by
Caio.
v5: Assert that LOAD_REG doesn't already exist in
brw_insert_load_reg. Update comment before fully_defines. Both
suggested by Caio.
v6: Don't explicitly special-case SHADER_OPCODE_MEMORY_STORE_LOGICAL.
Move the inst->dst.file != VGRF check earlier to avoid the loop over
sources. Both suggested by Ken. Move the call the brw_insert_load_reg
a little bit later, and explain why it's at that location. Suggested
by Caio.
v7: Many changes to the for-each-source loop in brw_insert_load_reg.
Removes incorrect multiplication of s.alloc.sizes with reg_unit. Adds
checks for matching SIMD size and NoMask in the search for pre-existing
LOAD_REG of same value.
v8: Add some unit tests. Suggested by Caio.
shader-db:
Lunar Lake
total instructions in shared programs: 16923237 -> 16921895 (<.01%)
instructions in affected programs: 450565 -> 449223 (-0.30%)
helped: 251 / HURT: 377
total cycles in shared programs: 910428418 -> 889920590 (-2.25%)
cycles in affected programs: 719248184 -> 698740356 (-2.85%)
helped: 9076 / HURT: 9082
total fills in shared programs: 2242 -> 2218 (-1.07%)
fills in affected programs: 116 -> 92 (-20.69%)
helped: 2 / HURT: 0
total sends in shared programs: 848635 -> 848421 (-0.03%)
sends in affected programs: 810 -> 596 (-26.42%)
helped: 10 / HURT: 0
LOST: 82
GAINED: 78
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19875784 -> 19871694 (-0.02%)
instructions in affected programs: 1050091 -> 1046001 (-0.39%)
helped: 251 / HURT: 2403
total cycles in shared programs: 905328238 -> 882446458 (-2.53%)
cycles in affected programs: 682736344 -> 659854564 (-3.35%)
helped: 7869 / HURT: 7911
total spills in shared programs: 5512 -> 5032 (-8.71%)
spills in affected programs: 1830 -> 1350 (-26.23%)
helped: 8 / HURT: 0
total fills in shared programs: 5648 -> 4782 (-15.33%)
fills in affected programs: 3312 -> 2446 (-26.15%)
helped: 8 / HURT: 0
total sends in shared programs: 1032942 -> 1032722 (-0.02%)
sends in affected programs: 572 -> 352 (-38.46%)
helped: 10 / HURT: 0
LOST: 138
GAINED: 53
Tiger Lake
total instructions in shared programs: 19711930 -> 19715591 (0.02%)
instructions in affected programs: 1040623 -> 1044284 (0.35%)
helped: 317 / HURT: 2474
total cycles in shared programs: 862988990 -> 860573870 (-0.28%)
cycles in affected programs: 612392461 -> 609977341 (-0.39%)
helped: 7447 / HURT: 7686
total sends in shared programs: 1034763 -> 1034555 (-0.02%)
sends in affected programs: 784 -> 576 (-26.53%)
helped: 8 / HURT: 0
LOST: 56
GAINED: 143
Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20545461 -> 20545220 (<.01%)
instructions in affected programs: 422405 -> 422164 (-0.06%)
helped: 180 / HURT: 459
total cycles in shared programs: 872697345 -> 866874523 (-0.67%)
cycles in affected programs: 573117917 -> 567295095 (-1.02%)
helped: 6783 / HURT: 6980
total spills in shared programs: 4335 -> 4336 (0.02%)
spills in affected programs: 90 -> 91 (1.11%)
helped: 1 / HURT: 2
total fills in shared programs: 4194 -> 4196 (0.05%)
fills in affected programs: 463 -> 465 (0.43%)
helped: 1 / HURT: 2
total sends in shared programs: 1079446 -> 1079238 (-0.02%)
sends in affected programs: 784 -> 576 (-26.53%)
helped: 8 / HURT: 0
LOST: 117
GAINED: 37
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 209708136 -> 209695617 (-0.01%); split: -0.02%, +0.01%
Send messages: 10927753 -> 10927640 (-0.00%)
Cycle count: 30540172048 -> 30427084732 (-0.37%); split: -0.99%, +0.62%
Spill count: 511621 -> 510932 (-0.13%); split: -0.22%, +0.08%
Fill count: 621166 -> 618440 (-0.44%); split: -0.56%, +0.12%
Scratch Memory Size: 35574784 -> 35648512 (+0.21%); split: -0.06%, +0.26%
Max live registers: 65453860 -> 65453140 (-0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 75374990 -> 35195764 (-53.31%)
Totals from 503284 (71.25% of 706391) affected shaders:
Instrs: 180203778 -> 180191259 (-0.01%); split: -0.02%, +0.01%
Send messages: 9699732 -> 9699619 (-0.00%)
Cycle count: 30080349592 -> 29967262276 (-0.38%); split: -1.01%, +0.63%
Spill count: 511584 -> 510895 (-0.13%); split: -0.22%, +0.08%
Fill count: 621120 -> 618394 (-0.44%); split: -0.56%, +0.12%
Scratch Memory Size: 35443712 -> 35517440 (+0.21%); split: -0.06%, +0.27%
Max live registers: 52566092 -> 52565372 (-0.00%); split: -0.01%, +0.00%
Non SSA regs after NIR: 70110949 -> 29931723 (-57.31%)
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
This prevents assertion failures in brw_eu_emit in a later commit in
this MR. Even though they have not been previously observed, these
assertion failures could happen even without that commit.
No shader-db or fossil-db changes on any Intel platform.
Fixes: 04e1783278 ("brw: Call brw_fs_opt_algebraic less often")
v2: Add SHUFFLE. Suggested by Ken. Fixed indentation.
v3: Update BROADCAST exec_size after rebasing on "brw/build: Use SIMD8
temporaries in emit_uniformize".
v4: Explain why munging the exec_size is correct.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>