The Bspec also says, "The table below describes the SIMD modes which
are supported. SIMD32 and SIMD64 are used for media-type operations
only." Perhaps this commit should just add
if (devinfo->ver >= 20)
return 16;
instead.
v2: Use reg_unit in get_sampler_lowered_simd_width. Suggested by Sagar.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
Otherwise the code generator will attempt to emit SIMD-lowered
QUAD_SWIZZLE instructions with an execution group not multiple of 8,
which is invalid on Xe2+.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27165>
On Xe2+, we don't build the SIMD8 shader so this check makes sure we
don't execute the uninitialized invocations.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26886>
Implements integer dot product lowering both with and without
DP4A. Implements half-float dot product lowering.
There are a couple FINISHME comments describing future optimizations.
v2: Add a brw_compiler::lower_dpas flag to track when the lowering
should be applied.
v3: Use is_null() instead of checking file != ARF. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
v2: Add brw_ir_performance.cpp and brw_fs_generator.cpp changes. Fix
overlapping register allocation (via has_source_and_destination_hazard). Fix
incorrect destination register file encoding.
v3: Prevent lower_regioning from trying to "fix" DPAS sources.
v4: Add instruction latency information for scheduling and perf
estimates.
v5: Remove all mention of DPASW. Suggested by Curro and Caio. Update
the comment in fs_inst::has_source_and_destination_hazard. Suggested
by Caio.
v6: Add some comments near the src2 calculation in
fs_inst::size_read. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
Extend the pre-existing dual-SIMD8 compilation path in
brw_compile_fs() to attempt quad-SIMD8 and dual-SIMD16 compiles.
Instead of building every possible dispatch mode and then picking one
based on cycle-count heuristics, this attempts to only build a single
multipolygon kernel -- The different mulipolygon dispatch modes are
tried in the expected order of decreasing performance (quad-SIMD8,
dual-SIMD16 then dual-SIMD8), the first one that successfully compiles
without spills is taken as a simple heuristic, and no further
multipolygon builds are attempted.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
This is needed because the information stored on the ATTR file for
multipolygon fragment shaders isn't stored as a contiguous sequence in
the GRF, instead the ATTR source may be lowered by assign_urb_setup()
to use a <16;8,0> region, which reads 4 SIMD16 GRFs for a SIMD32
instruction, even though the result of fs_inst::size_read() is
expected to be 2 GRFs. Special case ATTR sources for multipolygon PS
shaders to calculate the number of physical GRFs that will actually be
read by the instruction after lowering, based on the number of
polygons processed by the instruction.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
This is based on a previous patch by Marcin Ślusarz addressing the
same issue, though it's largely rewritten, simplified and includes
additional fixes.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
This fixes a number of assumptions made by the multipolygon input
attribute handling code from assign_urb_setup() so it also works on
Xe2+, which has additional multipolygon dispatch modes (like SIMD4x8
and SIMD2x16) and uses a different more compact representation of the
plane parameters.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
The interpolation deltas of PS inputs now show up as a 12B vec3 (A0,
A1-A0, A2-A0) in the ATTR file, instead of the previously used 16B
format with an unused component.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
The X and Y barycentric vectors are no longer interleaved in SIMD8
chunks (yay), so this is mostly a matter of disabling the
lower_barycentrics() pass and switching to a simpler implementation of
fetch_barycentric_reg() that simply calls fetch_payload_reg() instead
of the SIMD8 shuffling we had to do in previous generations.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
This extends fetch_payload_reg() to support fetching vector registers
like barycentrics stored on the payload as a contiguous sequence of
SIMD-wide vectors. In the SIMD32 case, both halves of the SIMD16
vector registers specified as regs[0] and regs[1] are zipped to
construct a single SIMD32-wide vector.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
Note from Caio: proper handling of brw_sample_mask_reg
will appear in later patches.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
These restrictions don't seem to be applicable anymore, and limiting
to SIMD8 wouldn't work since we're no longer building shaders with
that dispatch width.
[ Francisco: This one-liner change was squashed by Rohan Garg into a
previous version of my patch "Stop building SIMD8 programs", but it
makes more sense as a separate commit -- Formatted as a separate
patch. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26605>
SIMD8 kernels are no longer able to utilize the ALUs efficiently,
since they have twice the vector width as previous platforms. However
even though there aren't many reasons to use it, SIMD8 is still
supported by the instruction set technically, and it will still be
used for some SIMD-lowering sequences.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26605>
Similar to other FS dispatch modes, attempt to build a dual-SIMD8
program if the regular SIMD8 program didn't spill and doubling the
amount of space for varyings doesn't cause us to go over the thread
payload limit. Dual-SIMD8 builds in combination with coarse pixel
shading are currently not handled.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
The copy would be discarded immediately. Until now we were relying on
DCE to eliminate these, but it seems like in some cases MOVs into the
null register emitted by lower_simd_width() are never eliminated,
likely because a lower_simd_width() call has been introduced close to
the bottom of optimize() which isn't follow by any additional DCE
passes.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
This fixes a number of regressions and hangs in multipolygon fragment
shaders that have FIND_LIVE_CHANNEL sequences which would otherwise
lead to access of a dead channel. Note that the failures don't seem
to be reproducible in simulation.
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Since the <8;8,0> regions they use in multipolygon mode could violate
regioning restrictions in some cases, depending on the execution type
of the instruction. Note that the assertion is removed from
try_copy_propagate() since a more accurate check is used within that
function than what fs_inst::can_change_types() can do.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
The updated layout includes one copy of each plane parameter per
channel of the SIMD thread, in order to allow channels to process
different polygons.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Instead of treating fs_reg::nr as an offset for ATTR registers simply
consider different indices as denoting disjoint spaces that can never
be accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+. See
"intel/fs: Map all GS input attributes to ATTR register number 0." for
the rationale.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
To allow specifying the number of polygons that will be processed per
SIMD thread.
Rework:
* Jordan: Add needs_register_pressure following
09cdb77a92 ("intel/fs: report max register pressure in shader stats")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
And use it in ANV in order to return a "SIMDNxM" name from
vkGetPipelineExecutablePropertiesKHR.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Add fields that track the number of polygons processed per PS SIMD
thread (note that this might be lower than the value that was
specified to the compiler via brw_compile_fs_params if compilation at
the desired polygon count wasn't possible), and the dispatch width of
the multi-polygon PS kernel.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
In the past, multiple writes to a single register were pretty common,
but since we've transitioned to NIR, and leave the IR in SSA form for
everything not captured in a phi-web, the pattern of generating new
temporary registers at each step is a lot more common.
This pass isn't nearly as useful now. Across fossil-db on Alchemist,
this affects only 0.55% of shaders, which fall into two cases:
- Coarse pixel shading pixel-X/Y setup. There are a few cases where
we write a partial calculation into a register, then have a second
instruction read that as a source and overwrite it as a destination.
While we could use a temporary here, it doesn't actually help with
register pressure at all, since there's the same amount of values
live at both instructions regardless. So while this pass kicks in,
it doesn't do anything useful.
- Geometry shader control data bits (5 shaders total). We track masks
for handling EndPrimitive in a single register across the program,
and apparently in some cases can split the live range. However, it's
a single register...only in geometry shaders...which use EndPrimitive.
None of them appear to be in danger of spilling, either. So this tiny
benefit doesn't seem to justify the cost of running the pass.
So, just throw it out. It's not worth keeping.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26343>
Per Ian suggestion. Also clear up a few unnecessary casts around the code and
use `s` for fs_visitor ("shader"). Note to include a reference in ntf we need
to set it during initialization, so create an explicit mem_ctx for it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
Uses the dispatch_width from the shader (fs_visitor). This was not
possible before because the dispatch_width was not part of
backend_shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
This will allow fs_builder have a reference to an fs_visitor (a
"fs_shader" really), instead of a reference to a backend_shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
At this point this is more a header dependency due to inline functions,
so shuffle them around. The end goal is to allow fs_builder have a
reference to a fs_visitor (really a fs_shader).
Note the header is still included, a later patch will move the includes
to the call-sites.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
The remaining users can simply create a new builder at_end() if needed.
In many places a new builder object is already being constructed, so
just give more specific instructions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
The current name doesn't cover all the tex related instructions and
in all usages, we already have a switch statement to dispatch
per instruction type, so is more natural to list the instructions we
care there.
In fs::is_send_from_grf() we can simply ignore them since the
instructions are either lowered directly to SEND (Gfx7+) or use
MRF (Gfx6-).
With this change, the fs_inst::size_read() generated code gets
simplified (the "tex" entries get added to the switch jump table
in gcc) and the default case loses the conditional handling tex.
This reduces shader compilation time, as illustrated by replaying
fossils (tested on my TGL laptop):
```
// Rise of the Tomb Raider (N=13)
Difference at 95.0% confidence
-1.32231 +/- 0.0170138
-4.37605% +/- 0.0563054%
(Student's t, pooled s = 0.0210159)
// Cyberpunk 2077 (N=7)
Difference at 95.0% confidence
-3.64 +/- 0.114993
-2.95188% +/- 0.0932544%
(Student's t, pooled s = 0.09873)
```
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25721>