Main motivation is that for multipolygon PS shaders the i-th plane
parameter for the j-th input attribute will no longer necessarily be a
scalar, since different channels may be processing different polygons
with different input plane parameters, so simply taking a component()
of the result of interp_reg() will no longer work. Instead of
duplicating the multipolygon handling logic in every caller of
interp_reg(), fold the component() call into interp_reg() so we can
replace it with multipolygon-correct code more easily.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Instead of treating fs_reg::nr as an offset for ATTR registers simply
consider different indices as denoting disjoint spaces that can never
be accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+. See
"intel/fs: Map all GS input attributes to ATTR register number 0." for
the rationale.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Instead of treating fs_reg::nr as an offset for ATTR registers simply
consider different indices as denoting disjoint spaces that can never
be accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+. See
"intel/fs: Map all GS input attributes to ATTR register number 0." for
the rationale.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Instead of treating fs_reg::nr as an offset for ATTR registers simply
consider different indices as denoting disjoint spaces that can never
be accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+. See
"intel/fs: Map all GS input attributes to ATTR register number 0." for
the rationale.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Instead of treating fs_reg::nr as an offset for ATTR registers simply
consider different indices as denoting disjoint spaces that can never
be accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+. See
"intel/fs: Map all GS input attributes to ATTR register number 0." for
the rationale.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
The fs_reg::nr field currently has a somewhat inconsistent meaning for
ATTR registers depending on the shader stage. In geometry stages it
has a similar effect as fs_reg::offset except it's expressed in 32B
units instead of B units. In the PS however it's expressed in units
of logical scalar attributes (16B on present platforms), which isn't
currently handled correctly throughout the back-end since some places
assume 32B units in all cases.
The different format of the PS setup data in multi-polygon dispatch
modes would make its behavior even more irregular, which would be
worsened further (for both geometry and pixel stages) by the register
size changes coming up on Xe2, particularly in brw_ir_fs.h helpers
where neither the devinfo struct nor the shader stage are available.
Instead of treating it as an offset simply consider different
fs_reg::nr indices as denoting disjoint spaces that can never be
accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
To allow specifying the number of polygons that will be processed per
SIMD thread.
Rework:
* Jordan: Add needs_register_pressure following
09cdb77a92 ("intel/fs: report max register pressure in shader stats")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
And use it in ANV in order to return a "SIMDNxM" name from
vkGetPipelineExecutablePropertiesKHR.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Add fields that track the number of polygons processed per PS SIMD
thread (note that this might be lower than the value that was
specified to the compiler via brw_compile_fs_params if compilation at
the desired polygon count wasn't possible), and the dispatch width of
the multi-polygon PS kernel.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Add a brw_compile_fs_params parameter that specifies to the compiler
the maximum number of polygons that may be processed in parallel per
PS SIMD thread.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
In the past, multiple writes to a single register were pretty common,
but since we've transitioned to NIR, and leave the IR in SSA form for
everything not captured in a phi-web, the pattern of generating new
temporary registers at each step is a lot more common.
This pass isn't nearly as useful now. Across fossil-db on Alchemist,
this affects only 0.55% of shaders, which fall into two cases:
- Coarse pixel shading pixel-X/Y setup. There are a few cases where
we write a partial calculation into a register, then have a second
instruction read that as a source and overwrite it as a destination.
While we could use a temporary here, it doesn't actually help with
register pressure at all, since there's the same amount of values
live at both instructions regardless. So while this pass kicks in,
it doesn't do anything useful.
- Geometry shader control data bits (5 shaders total). We track masks
for handling EndPrimitive in a single register across the program,
and apparently in some cases can split the live range. However, it's
a single register...only in geometry shaders...which use EndPrimitive.
None of them appear to be in danger of spilling, either. So this tiny
benefit doesn't seem to justify the cost of running the pass.
So, just throw it out. It's not worth keeping.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26343>
This works exactly the same as the other atomics and the missing
destination is already handled in lower_logical_sends().
Only affects 2 shaders in fossil-db (in Cyberpunk 2077), but the
cycle count drops by 4.23%. Nice to have in place at any rate.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26343>
Per Ian suggestion. Also clear up a few unnecessary casts around the code and
use `s` for fs_visitor ("shader"). Note to include a reference in ntf we need
to set it during initialization, so create an explicit mem_ctx for it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
Uses the dispatch_width from the shader (fs_visitor). This was not
possible before because the dispatch_width was not part of
backend_shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
This will allow fs_builder have a reference to an fs_visitor (a
"fs_shader" really), instead of a reference to a backend_shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
At this point this is more a header dependency due to inline functions,
so shuffle them around. The end goal is to allow fs_builder have a
reference to a fs_visitor (really a fs_shader).
Note the header is still included, a later patch will move the includes
to the call-sites.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
The remaining users can simply create a new builder at_end() if needed.
In many places a new builder object is already being constructed, so
just give more specific instructions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
Use the "current bld" in nir_to_brw_state more widely, and also replace it
with an annotated version when applicable (to associate it with a NIR
instruction being lowered). After filling a block we reset it back to
the original value.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
Create a nir_to_brw_state struct that is valid only during the
NIR to backend translation and use it for nir_ssa_values array.
This removes some NIR specific handling out of the fs_visitor -- nowadays
effectively an fs_shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
There was a subtle bug related to CFG tracking. Namely, some branch
instructions may point *only* to the block after the DO instruction
for the loop. If the MOV instructions are in the DO block, the may not
have liveness properly tracked.
Like in !25132, having the MOV instructions in blocks that might
contain other instructions helps scheduling.
shader-db:
All Broadwell and newer Intel GPUs had similar results (Ice Lake shown)
total cycles in shared programs: 848577248 -> 848557268 (<.01%)
cycles in affected programs: 78256396 -> 78236416 (-0.03%)
helped: 361 / HURT: 18
fossil-db:
All Skylake and newer Intel GPUs had similar results (Ice Lake shown)
Totals:
Cycles: 15021501924 -> 15021372904 (-0.00%); split: -0.00%, +0.00%
Totals from 735 (0.11% of 656080) affected shaders:
Cycles: 676429502 -> 676300482 (-0.02%); split: -0.02%, +0.00%
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26439>
nir_emit_global_atomic should utilize lsc_op_num_data_values to
infer the number of operands for global atomic ops, following the same
pattern as nir_emit_surface_atomic
Fixes: 90a2137 ('intel/compiler: Use LSC opcode enum rather than legacy BRW_AOPs')
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26432>
align is a function and when we want use it, the align variable will shadow it
So replace it with other names
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25997>