We only need it for indirect draws.
Improves performance on an i7-12700 and A770:
- Piglit's drawoverhead base case +150.639% +/- 2.86933% (n=15).
- gfxbench5 gl_driver2_off +19.7219% +/- 1.13778% (n=15)
- SPECviewperf2020 catiav5test1 +1.6831% +/- 0.552052% (n=10).
Cc: mesa-stable
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26806>
Whenever we use a BO in a batch, we need to find its corresponding exec
list entry, either to a) record that it's been used, b) update whether
it's being written, c) check for cross-batch implicit dependencies.
bo->index exists to accelerate these lookups. If a BO is used multiple
times by a batch, bo->index is its location in the list. Because the
field is global, and a BO can in theory be used concurrently by multiple
contexts, we need to double-check whether it's still there. If not, we
fall back to a linear search of all BOs in the list, looking to see if
our index was simply wrong (but presumably right for another context).
However, there's one glaringly obvious case that we missed here. If
bo->index is -1, then it's wrong for /all/ contexts, and in fact implies
that said BO has never been added to any exec list, ever. This is quite
common in fact: a new BO, never been used before, say from the BO cache,
or streaming uploaders, gets used for the first time.
In this case we can simply conclude that it's not in the list and skip
the linear walk through all buffers referenced by the batch.
Improves performance on an i7-12700 and A770:
- SPECviewperf2020 catiav5test1: 72.9214% +/- 0.312735% (n=45)
Cc: mesa-stable
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26806>
A value of -1 means that the buffer has never been used in an execbuf
buffer list in any of our contexts. While setting this isn't critical,
doing so will allow us to short-circuit some looping in the next patch.
Cc: mesa-stable
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26806>
This function is only used once. By inlining it, we can more easily
compare the CCS plane import code with the clear color plane import
code.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25768>
Inline iris_resource_finish_aux_import and reimplement it with the
helper functions for managing resource planes. Provides more testing for
the helper functions and simplifies the code.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25768>
Instead of checking the plane count in order to finish importing the aux
info, just check the plane index. Planes are added in reverse, so we'll
have the complete number of planes once we reach plane zero.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25768>
These restrictions don't seem to be applicable anymore, and limiting
to SIMD8 wouldn't work since we're no longer building shaders with
that dispatch width.
[ Francisco: This one-liner change was squashed by Rohan Garg into a
previous version of my patch "Stop building SIMD8 programs", but it
makes more sense as a separate commit -- Formatted as a separate
patch. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26605>
SIMD8 kernels are no longer able to utilize the ALUs efficiently,
since they have twice the vector width as previous platforms. However
even though there aren't many reasons to use it, SIMD8 is still
supported by the instruction set technically, and it will still be
used for some SIMD-lowering sequences.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26605>
Similar to other FS dispatch modes, attempt to build a dual-SIMD8
program if the regular SIMD8 program didn't spill and doubling the
amount of space for varyings doesn't cause us to go over the thread
payload limit. Dual-SIMD8 builds in combination with coarse pixel
shading are currently not handled.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
It's not replicated per SIMD16 half of a SIMD32 thread on the PS
payload. Make fs_visitor::payload::depth_w_coef_reg a scalar rather
than an array.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
The copy would be discarded immediately. Until now we were relying on
DCE to eliminate these, but it seems like in some cases MOVs into the
null register emitted by lower_simd_width() are never eliminated,
likely because a lower_simd_width() call has been introduced close to
the bottom of optimize() which isn't follow by any additional DCE
passes.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
This fixes a number of regressions and hangs in multipolygon fragment
shaders that have FIND_LIVE_CHANNEL sequences which would otherwise
lead to access of a dead channel. Note that the failures don't seem
to be reproducible in simulation.
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Flat-shaded inputs and other per-primitive values can no longer be
considered to be uniform across fragment shader subgroups due to
multipolygon dispatch.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Since the <8;8,0> regions they use in multipolygon mode could violate
regioning restrictions in some cases, depending on the execution type
of the instruction. Note that the assertion is removed from
try_copy_propagate() since a more accurate check is used within that
function than what fs_inst::can_change_types() can do.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
The updated layout includes one copy of each plane parameter per
channel of the SIMD thread, in order to allow channels to process
different polygons.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Main motivation is that for multipolygon PS shaders the i-th plane
parameter for the j-th input attribute will no longer necessarily be a
scalar, since different channels may be processing different polygons
with different input plane parameters, so simply taking a component()
of the result of interp_reg() will no longer work. Instead of
duplicating the multipolygon handling logic in every caller of
interp_reg(), fold the component() call into interp_reg() so we can
replace it with multipolygon-correct code more easily.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Instead of treating fs_reg::nr as an offset for ATTR registers simply
consider different indices as denoting disjoint spaces that can never
be accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+. See
"intel/fs: Map all GS input attributes to ATTR register number 0." for
the rationale.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Instead of treating fs_reg::nr as an offset for ATTR registers simply
consider different indices as denoting disjoint spaces that can never
be accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+. See
"intel/fs: Map all GS input attributes to ATTR register number 0." for
the rationale.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Instead of treating fs_reg::nr as an offset for ATTR registers simply
consider different indices as denoting disjoint spaces that can never
be accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+. See
"intel/fs: Map all GS input attributes to ATTR register number 0." for
the rationale.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Instead of treating fs_reg::nr as an offset for ATTR registers simply
consider different indices as denoting disjoint spaces that can never
be accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+. See
"intel/fs: Map all GS input attributes to ATTR register number 0." for
the rationale.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
The fs_reg::nr field currently has a somewhat inconsistent meaning for
ATTR registers depending on the shader stage. In geometry stages it
has a similar effect as fs_reg::offset except it's expressed in 32B
units instead of B units. In the PS however it's expressed in units
of logical scalar attributes (16B on present platforms), which isn't
currently handled correctly throughout the back-end since some places
assume 32B units in all cases.
The different format of the PS setup data in multi-polygon dispatch
modes would make its behavior even more irregular, which would be
worsened further (for both geometry and pixel stages) by the register
size changes coming up on Xe2, particularly in brw_ir_fs.h helpers
where neither the devinfo struct nor the shader stage are available.
Instead of treating it as an offset simply consider different
fs_reg::nr indices as denoting disjoint spaces that can never be
accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
To allow specifying the number of polygons that will be processed per
SIMD thread.
Rework:
* Jordan: Add needs_register_pressure following
09cdb77a92 ("intel/fs: report max register pressure in shader stats")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
And use it in ANV in order to return a "SIMDNxM" name from
vkGetPipelineExecutablePropertiesKHR.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>