Instead of treating fs_reg::nr as an offset for ATTR registers simply
consider different indices as denoting disjoint spaces that can never
be accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+. See
"intel/fs: Map all GS input attributes to ATTR register number 0." for
the rationale.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Instead of treating fs_reg::nr as an offset for ATTR registers simply
consider different indices as denoting disjoint spaces that can never
be accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+. See
"intel/fs: Map all GS input attributes to ATTR register number 0." for
the rationale.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
The fs_reg::nr field currently has a somewhat inconsistent meaning for
ATTR registers depending on the shader stage. In geometry stages it
has a similar effect as fs_reg::offset except it's expressed in 32B
units instead of B units. In the PS however it's expressed in units
of logical scalar attributes (16B on present platforms), which isn't
currently handled correctly throughout the back-end since some places
assume 32B units in all cases.
The different format of the PS setup data in multi-polygon dispatch
modes would make its behavior even more irregular, which would be
worsened further (for both geometry and pixel stages) by the register
size changes coming up on Xe2, particularly in brw_ir_fs.h helpers
where neither the devinfo struct nor the shader stage are available.
Instead of treating it as an offset simply consider different
fs_reg::nr indices as denoting disjoint spaces that can never be
accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Per Ian suggestion. Also clear up a few unnecessary casts around the code and
use `s` for fs_visitor ("shader"). Note to include a reference in ntf we need
to set it during initialization, so create an explicit mem_ctx for it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
Uses the dispatch_width from the shader (fs_visitor). This was not
possible before because the dispatch_width was not part of
backend_shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
This will allow fs_builder have a reference to an fs_visitor (a
"fs_shader" really), instead of a reference to a backend_shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
The remaining users can simply create a new builder at_end() if needed.
In many places a new builder object is already being constructed, so
just give more specific instructions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
Use the "current bld" in nir_to_brw_state more widely, and also replace it
with an annotated version when applicable (to associate it with a NIR
instruction being lowered). After filling a block we reset it back to
the original value.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
Create a nir_to_brw_state struct that is valid only during the
NIR to backend translation and use it for nir_ssa_values array.
This removes some NIR specific handling out of the fs_visitor -- nowadays
effectively an fs_shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
nir_emit_global_atomic should utilize lsc_op_num_data_values to
infer the number of operands for global atomic ops, following the same
pattern as nir_emit_surface_atomic
Fixes: 90a2137 ('intel/compiler: Use LSC opcode enum rather than legacy BRW_AOPs')
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26432>
Once NIR code is lowered and a few optimization passes have run, there
might be flag register interactions between instructions quite far
away from one another.
In the following case :
f0 = and r0, r1
...
fs_interpolate r2, r3
...
if f0
...
endif
If we lower fs_inteporlate while using the f0 register, we completely
garble the value meant for the if block.
To fix this, emit the predication for fs_interpolate in brw_fs_nir.cpp
when doing the NIR translation to the backend IR. This will guarantee
that the flag register interactions are visible to the optimization
passes, avoiding the problem above.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 68027bd38e ("intel/fs: implement dynamic interpolation mode for dynamic persample shaders")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9757
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26306>
Those are not reused, so this will be the first and only allocation, so
no need to use the "realloc" variants.
For the fs_reg arrays, there's currently no particular reason to keep
them uninitialized, so zero-initialize them too -- not ideal but better
than random values.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26302>
The workaround applies specifically to Cube and Cube Arrays, so we can
still apply the optimization for the others.
Ideally we would like to pull opt_zero_samples logic into the lowering
sends -- to avoid adding a bit to communicate between passes. However
the texture coordinates for the LOGICAL backend instructions, which
are a common target for the optimization, are combined into offsets over
a single VGRF, so we can't easily identify the constant cases. The
copy-prop pass make this more visible for opt_zero_samples.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>
For Task/Mesh WorkgroupID is now lowered to WorkgroupIndex by the
generic NIR pass, so we shouldn't hit this. We can now simplify the
asserting code in emit_work_group_id_setup().
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25977>
SIMD1 instructions are problematic because they are considered partial
writes. This increases the liveness of the destination register
written by those instructions. To workaround this we use UNDEF
instructions to bound the liveness of the register. But this causing
other issues like in this case :
undef(1) vgrf2
mov(1) vgrf2, u4.0
add(1) vgrf3, vgrf2.0, 64UD
In this case the copy propagation pass in unable to see that vgrf2 in
the add() instruction can be replaced with the uniform u4.0.
To fix this problem, we switch NoMask SIMD8 instructions that cover
the entire register. We can drop the UNDEF instructions and now copy
propagation can do its job.
Good results on 2 apps :
Cyberpunk 2077 :
Totals from 7258 (68.80% of 10549) affected shaders:
Instrs: 6332210 -> 6073833 (-4.08%); split: -4.11%, +0.03%
Cycles: 130667501 -> 127351268 (-2.54%); split: -3.12%, +0.58%
Subgroup size: 90320 -> 90400 (+0.09%)
Spill count: 90 -> 68 (-24.44%)
Fill count: 82 -> 64 (-21.95%)
Scratch Memory Size: 8192 -> 6144 (-25.00%)
Max live registers: 385464 -> 375152 (-2.68%)
Max dispatch width: 64336 -> 64424 (+0.14%); split: +0.96%, -0.82%
Gaining 60 SIMD16/SIMD32 shaders, loosing 33
Strange Brigade :
Totals from 2137 (53.12% of 4023) affected shaders:
Instrs: 1544031 -> 1457544 (-5.60%); split: -5.60%, +0.00%
Cycles: 22292564 -> 21868978 (-1.90%); split: -2.43%, +0.53%
Subgroup size: 25328 -> 25344 (+0.06%)
Max live registers: 113716 -> 111214 (-2.20%)
Max dispatch width: 17232 -> 18608 (+7.99%); split: +8.36%, -0.37%
Gaining 138 SIMD16/SIMD32 shaders, loosing 4
On app slightly negatively affected :
Dota2 :
Totals from 232 (14.73% of 1575) affected shaders:
Instrs: 30029 -> 28194 (-6.11%)
Cycles: 385155 -> 371422 (-3.57%); split: -3.59%, +0.02%
Max live registers: 6792 -> 6780 (-0.18%)
Max dispatch width: 2256 -> 2160 (-4.26%)
Loosing 6 SIMD32 shaders
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24554>
Some recent NIR changes started generated those instructions. We need
to handle them to be able to rematerialize.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24554>
This is what most logical SEND messages do when they take a variable
number of components. 'inst->mlen' is expected to be zero for logical
SEND opcodes, which are expected to behave like plain arithmetic
operations, so certain automated transformations (like SIMD lowering)
can manipulate them without opcode-specific special-casing.
Guessing the number of components from 'inst->mlen' has other
disadvantages, because it requires duplicating the logic that infers
the message payload size in every use of the instruction -- Instead we
can just do the computation once during logical send lowering. In
addition on LNL platform this causes the 'inst->mlen' field of URB
writes to have units inconsistent with every other SEND instruction,
which is likely to lead to confusion and bugs down the road.
Rework:
* Marcin: update emit_urb_indirect_vec4_write
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25195>