Most of the churn in this commit is changing unit tests that were
testing things that are now invalid.
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17122204 -> 17122669 (<.01%)
instructions in affected programs: 120669 -> 121134 (0.39%)
helped: 0 / HURT: 124
total cycles in shared programs: 895602370 -> 895613210 (<.01%)
cycles in affected programs: 17868974 -> 17879814 (0.06%)
helped: 35 / HURT: 85
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 210736518 -> 210743769 (+0.00%)
Cycle count: 30377733040 -> 30377699060 (-0.00%); split: -0.00%, +0.00%
Max live registers: 66056852 -> 66056966 (+0.00%)
Totals from 1505 (0.21% of 706776) affected shaders:
Instrs: 1890151 -> 1897402 (+0.38%)
Cycle count: 48397408 -> 48363428 (-0.07%); split: -0.11%, +0.04%
Max live registers: 256821 -> 256935 (+0.04%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 020b0055e7 ("i965/fs: Propagate conditional modifiers from compares to adds")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34509>
When I originally wrote that code, I didn't understand what a jerk NaN
can be.
v2: Remove the brw_type_is_uint stuff. This function is currently only
called for float types. In a later commit, integer types will be
supported but only for NZ and Z conditions. Noticed by Matt.
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17122197 -> 17122204 (<.01%)
instructions in affected programs: 1691 -> 1698 (0.41%)
helped: 0 / HURT: 4
total cycles in shared programs: 895602484 -> 895602370 (<.01%)
cycles in affected programs: 912964 -> 912850 (-0.01%)
helped: 2 / HURT: 2
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 210736388 -> 210736518 (+0.00%)
Cycle count: 30377728900 -> 30377733040 (+0.00%); split: -0.00%, +0.00%
Totals from 130 (0.02% of 706776) affected shaders:
Instrs: 169911 -> 170041 (+0.08%)
Cycle count: 18021210 -> 18025350 (+0.02%); split: -0.00%, +0.02%
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 020b0055e7 ("i965/fs: Propagate conditional modifiers from compares to adds")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34509>
The previous commit converts any NOT that might have been affected by
this path into a simple MOV. Those MOVs are handled by other paths.
No shader-db or fossil-db changes on any Intel platform.
v2: Fix a bad squash. Changes that were accidentally in this commit were
supposed to be in the previous commit. Noticed by Ivan.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34509>
On Xe platforms, many fragment shaders have patterns like:
asr(8) g21<2>W g1.2<0,1,0>W 15D
...
mov(8) g11<1>UW g21<16,8,2>UW
...
not.nz.f0.0(8) null<1>D g11<8,8,1>W
Converting the NOT.NZ to MOV.Z enables copy propagation to eliminate the
original MOV. Then cmod propagation is able to eliminate the
NOT-converted-to-MOV.
It might be possible to cover this case by adding more opcodes to the
list NOT can propagate to. The next commit will show that just
converting to MOV is a better approach anyway.
v2: Fix a bad squash. Changes that were supposed to be in this commit
were accidentally in the next commit. Noticed by Ivan.
shader-db:
Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
total instructions in shared programs: 20069804 -> 20065167 (-0.02%)
instructions in affected programs: 592450 -> 587813 (-0.78%)
helped: 2300 / HURT: 0
total cycles in shared programs: 884534032 -> 884496201 (<.01%)
cycles in affected programs: 13064194 -> 13026363 (-0.29%)
helped: 1285 / HURT: 790
LOST: 18
GAINED: 15
fossil-db:
Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 234506495 -> 234468664 (-0.02%)
Cycle count: 24444825202 -> 24445710703 (+0.00%); split: -0.01%, +0.01%
Max live registers: 42349793 -> 42349789 (-0.00%)
Max dispatch width: 7131344 -> 7131744 (+0.01%); split: +0.05%, -0.04%
Totals from 16673 (2.07% of 805781) affected shaders:
Instrs: 6497669 -> 6459838 (-0.58%)
Cycle count: 435877770 -> 436763271 (+0.20%); split: -0.54%, +0.74%
Max live registers: 1122972 -> 1122968 (-0.00%)
Max dispatch width: 151528 -> 151928 (+0.26%); split: +2.19%, -1.92%
No shader-db or fossil-db on any other Intel platforms.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34509>
brw_opt_constant_fold_instruction can either do nothing or replace the
instruction with a MOV of an immediate value. Previously each opcode
case would perform this replacement, and code at the bottom of the
function would verify the results.
It is much simpler if each opcode case calculates a result in a brw_reg,
and code at the bottom of the function performs the replacement. There
are two outlier cases that cannot use this pattern: MAD and
BROADCAST. These cases simply return directly from the switch-statement
after performing the replacement.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34707>
We are reaching our limit of adding flags to intel_debug
(apporaching 64 flags). Switch intel_debug to a bitset,
which gives us almost "unlimited" bits to use in the future.
v2(Michael Cheng): Fixed a few ci errors
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Casey Bowman <casey.g.bowman@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34596>
This commit moves the calculation of tri_bary out of
brw_nir_rt_load_mem_hit_from_addr(), and only do the calculation on
demand, since unorm_float_convert can be expensive. We do this for both
Xe1/2 and Xe3+ for consistency.
Signed-off-by: Kevin Chuang <kaiwenjon23@gmail.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33047>
For Xe3+, the upper 8 bits of the second dword of a potential hit is
used to store hitGroupIndex0, which is stuffed by the HW. This
hitGroupIndex0 will later be used by the HW again to reconstruct the
whole hitGroupIndex when driver issues a TRACE_RAY_COMMIT.
We were corrupting this hitGroupIndex0 at the driver by setting the
whole dword to hit_kind, which will cause the HW to read a wrong
hitGroupIndex and therefore invoke a wrong closest hit shader. The
behavior can be seen in
dEQP-VK.ray_tracing_pipeline.pipeline_no_null_shaders_flag.gpu.boxes.\*
and dEQP-VK.ray_tracing_pipeline.pipeline_library.configurations.\*
This commit changes the driver to only use lower 24bits to store the
hit_kind, and leave the upper 8bits as it.
Signed-off-by: Kevin Chuang <kaiwenjon23@gmail.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33047>
This will help us to handle code path separately for Xe3+ for updated
64bit memory data structure for RT.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kevin Chuang <kaiwenjon23@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33047>
We can pass immediates to SHL and don't need to allocate a separate
register here.
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34604>
A cooperative matrix conversion operation was represented in NIR by the
cmat_unary_op intrinsic with an nir_alu_op as extra parameter,
that was already lowered to a specific conversion operation
based on the matrix types.
Instead of that, add a new intrinsic `cmat_convert` that is specific
for that conversion. In addition to the src/dst matrix descriptions
already available, also include the signedness information in the
intrinsic (reuse nir_cmat_signed for that). This is needed because
different Convert operations define different interpretations for
integers, regardless their original type.
In this patch, both radv and intel were changed to use the same logic
that was previously used to pick the lowered ALU op.
This change will help represent cmat conversions involving BFloat16,
because it avoids having to create new NIR ALU ops for all the
combinations involving BFloat16.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34511>
Floating point SEL.CMOD may flush denorms to zero. We don't have enough
information at this point in compilation to know whether or not it is
safe to remove that.
Integer SEL or SEL without a conditional modifier is just a fancy
MOV. Those are always safe to eliminate.
See also 3f782cdd25.
Fixes: fab92fa1cb ("i965/fs: Optimize SEL with the same sources into a MOV.")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>
The condition modifier on SEL means something completely different than
it means on MOV. On MOV it means to modify the flags based on the value
written to the destination. On SEL it means to compare the sources using
that mode and pick the result (i.e., as min() or max()) without
modifying the flags.
The resulting MOV should not have a condition modifier for the same
reason it (already) doesn't have a predicate. This bug was found by
inspection, so I added a unit test.
Fixes: fab92fa1cb ("i965/fs: Optimize SEL with the same sources into a MOV.")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>
Floating point SEL.CMOD may flush denorms to zero. We don't have enough
information at this point in compilation to know whether or not it is
safe to remove that.
Integer SEL or SEL without a conditional modifier is just a fancy
MOV. Those are always safe to eliminate.
See also 3f782cdd25.
Fixes: fab92fa1cb ("i965/fs: Optimize SEL with the same sources into a MOV.")
No shader-db changes on any Intel platform.
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 209903490 -> 209903492 (+0.00%)
Cycle count: 30546025224 -> 30546021980 (-0.00%); split: -0.00%, +0.00%
Max live registers: 65516231 -> 65516235 (+0.00%)
Totals from 2 (0.00% of 706657) affected shaders:
Instrs: 3197 -> 3199 (+0.06%)
Cycle count: 361650 -> 358406 (-0.90%); split: -10.05%, +9.15%
Max live registers: 300 -> 304 (+1.33%)
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>
The condition modifier on SEL means something completely different than
it means on MOV. On MOV it means to modify the flags based on the value
written to the destination. On SEL it means to compare the sources using
that mode and pick the result (i.e., as min() or max()) without
modifying the flags.
The resulting MOV should not have a condition modifier for the same
reason it (already) doesn't have a predicate. This bug was found by
inspection, so I added a unit test.
No shader-db or shader-db changes on any Intel platform.
Fixes: fab92fa1cb ("i965/fs: Optimize SEL with the same sources into a MOV.")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>
Gfx125+ has systolic, with exception for MTL and some ARL
variants. Update code and tests to use it.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34506>
This allows us to create temporary VGRFs that are larger than
MAX_VGRF_SIZE(devinfo), which will be split eventually. They may not
be split on the initial pass, because we may need LOAD_PAYLOAD lowering,
copy propagation, and so on to occur first. So we allow registers to
exceed that size initially.
The "Register allocation relies on split_virtual_grfs()" assertion in
brw_reg_allocate.cpp still asserts that all VGRFs which reach the
register allocator have been properly split.
One case where this is useful is for vectorizing convergent block loads.
We create temporaries to splat the SIMD1 values out to SIMD(N), which
can lead to some very large temporaries. However, copy propagation and
so on ultimately eliminate these and they'll get split down to proper
sizes or elided entirely in the end.
(Note: both this and the prior commits from this merge request are
needed to close the linked issue.)
Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12324
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>
Post-RA scheduling doesn't use liveness analysis, so we continue using
MAX_VGRF_SIZE(devinfo). But for pre-RA scheduling, we now use
live->max_vgrf_size.
This helps get us to a place where we can emit arbitrarily large VGRFs
early on in compilation, but which will be split and cleaned up prior to
register allocation. It may also allocate smaller arrays in practice
since MAX_VGRF_SIZE(devinfo) assumes the worst case scenario for things
we actually could need to allocate.
Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>
We're already looking at this data to calculate the per-component
vars_from_vgrf[] and vgrf_from_vars[] mappings, so just record the
largest VGRF size while we're here. This will allow passes to size
arrays based on the actual size needed, rather than hardcoding some
fixed size. In many cases, MAX_VGRF_SIZE(devinfo) is larger than
necessary, because e.g. vec5 sparse sampling results aren't used.
Not hardcoding this means we can also temporarily handle very large
VGRFs which we know will be split eventually, without having to
increase the maximum which is ultimately used for RA classes.
Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>
This is necessary to appropriately uniformize the first component
access of a convergent vector. Without this, this is produced:
load_payload(16) %18:D, 0d, 0d NoMask group0
add(32) %21:F, %18+0.0:F, 0.5f
add(32) %22:F, %18+2.0<0>:F, 0.5f
This is the correct code:
load_payload(16) %18:D, 0d, 0d NoMask group0
add(32) %21:F, %18+0.0<0>:F, 0.5f
add(32) %22:F, %18+2.0<0>:F, 0.5f
Without 38b58e286f, the code generated was more incorrect, but happened
to work for this test case:
load_payload(16) %18:D, 0d, 0d NoMask group0
add(32) %21:F, %18+0.0<0>:F, 0.5f
add(32) %22:F, %18+0.4<0>:F, 0.5f
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 38b58e286f ("brw/nir: Fix source handling of nir_intrinsic_load_barycentric_at_offset")
Closes: #12969
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34427>
Makes the intention of some comparisons clearer by using the named
helper functions. Add commentary when the straightforward range is not
the one used, e.g. VGRF interference.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>
Add support for dumping shader asm containing instruction line numbers
matching offsets within instruction state pool buffer. Offsets
should match values collected from eu stall sampling. This is
required for match eu stall data with individual shader instructions.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30142>
Expand constant sources to cover the region read by DPAS, and also
use NULL register as accumulator when possible.
Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34373>
Funny story... this is how regular b2f was implemented before Curro
implmented the `MOV dst:F -src:D` method 9 years ago (see
3ee2daf23d).
Eliminating the type conversion in the arithmetic operation enables the
next commit.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>
This commit prevents code quality regressions in the next
commit. Without this, some fragment shaders in Batman: Arkham Origins
have code like:
shr(8) g51<1>UW g1.28<1,8,0>UB 0x76543210V
...
and(8) g52<1>UD ~g51<8,8,1>UW 0x0001UW
...
add(8) g56<1>D -g52<8,8,1>D 1D
transformed to
shr(8) g51<1>UW g1.28<1,8,0>UB 0x76543210V
...
and(8) g52<1>UD ~g51<8,8,1>UW 0x0001UW
...
mov(8) g56<1>D -g52<8,8,1>D
...
and(8) g57<1>UD ~g56<8,8,1>D 0x00000001UD
Propagating through the negation allows the added MOV to be deleted.
shader-db:
All Intel platforms had simlar results. (Lunar Lake shown)
total instructions in shared programs: 16968020 -> 16968019 (<.01%)
instructions in affected programs: 281 -> 280 (-0.36%)
helped: 1 / HURT: 0
total cycles in shared programs: 914598850 -> 914598832 (<.01%)
cycles in affected programs: 5398 -> 5380 (-0.33%)
helped: 1 / HURT: 0
A single Blender vertex shader was affected.
fossil-db:
Lunar Lake, Tiger Lake, Ice Lake, and Skylake had similar results. (Lunar Lake shown)
Totals:
Instrs: 209894650 -> 209894651 (+0.00%)
Cycle count: 30545958586 -> 30545952860 (-0.00%)
Totals from 2 (0.00% of 706657) affected shaders:
Instrs: 3582 -> 3583 (+0.03%)
Cycle count: 1875100 -> 1869374 (-0.31%)
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Subgroup size: 9906400 -> 9906416 (+0.00%)
Totals from 2 (0.00% of 805770) affected shaders:
Subgroup size: 16 -> 32 (+100.00%)
Two compute shaders in Hogwarts Legacy were affected.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>
This is mostly defensive. If a convergent value ever ended up as a
source of a DDX or DDY, the eu_emit code will ignore the stride. This
will result in bad code being generated.
No shader-db or fossil-db changes on any Intel platform.
v2: DDX and DDY will always be float, but brw_imm_for_type only works
with integer types.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Suggested-by: Ken
Fixes: d5d7ae22ae ("brw/nir: Fix up handling of sources that might be convergent vectors")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33007>