v2: Add support for WE_all instructions... this already just worked, so
I only had to delete the check and the FINISHME comment.
v3: Use logic more like def_analysis::update_for_reads to determine when
to not insert LOAD_REG instructions. Based on a suggestion by Ken.
v4: Eliminate "store" from all the names since STORE_REG does not exist
anymore. Fold insert_load_reg into brw_insert_load_reg. Elminate extra
call to s.def_analysis.require() after progress. Pull a loop-invariant
check out of the inst->srouces loop. Drop call to
brw_opt_split_virtual_grfs after lowering load_reg. All suggested by
Caio.
v5: Assert that LOAD_REG doesn't already exist in
brw_insert_load_reg. Update comment before fully_defines. Both
suggested by Caio.
v6: Don't explicitly special-case SHADER_OPCODE_MEMORY_STORE_LOGICAL.
Move the inst->dst.file != VGRF check earlier to avoid the loop over
sources. Both suggested by Ken. Move the call the brw_insert_load_reg
a little bit later, and explain why it's at that location. Suggested
by Caio.
v7: Many changes to the for-each-source loop in brw_insert_load_reg.
Removes incorrect multiplication of s.alloc.sizes with reg_unit. Adds
checks for matching SIMD size and NoMask in the search for pre-existing
LOAD_REG of same value.
v8: Add some unit tests. Suggested by Caio.
shader-db:
Lunar Lake
total instructions in shared programs: 16923237 -> 16921895 (<.01%)
instructions in affected programs: 450565 -> 449223 (-0.30%)
helped: 251 / HURT: 377
total cycles in shared programs: 910428418 -> 889920590 (-2.25%)
cycles in affected programs: 719248184 -> 698740356 (-2.85%)
helped: 9076 / HURT: 9082
total fills in shared programs: 2242 -> 2218 (-1.07%)
fills in affected programs: 116 -> 92 (-20.69%)
helped: 2 / HURT: 0
total sends in shared programs: 848635 -> 848421 (-0.03%)
sends in affected programs: 810 -> 596 (-26.42%)
helped: 10 / HURT: 0
LOST: 82
GAINED: 78
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19875784 -> 19871694 (-0.02%)
instructions in affected programs: 1050091 -> 1046001 (-0.39%)
helped: 251 / HURT: 2403
total cycles in shared programs: 905328238 -> 882446458 (-2.53%)
cycles in affected programs: 682736344 -> 659854564 (-3.35%)
helped: 7869 / HURT: 7911
total spills in shared programs: 5512 -> 5032 (-8.71%)
spills in affected programs: 1830 -> 1350 (-26.23%)
helped: 8 / HURT: 0
total fills in shared programs: 5648 -> 4782 (-15.33%)
fills in affected programs: 3312 -> 2446 (-26.15%)
helped: 8 / HURT: 0
total sends in shared programs: 1032942 -> 1032722 (-0.02%)
sends in affected programs: 572 -> 352 (-38.46%)
helped: 10 / HURT: 0
LOST: 138
GAINED: 53
Tiger Lake
total instructions in shared programs: 19711930 -> 19715591 (0.02%)
instructions in affected programs: 1040623 -> 1044284 (0.35%)
helped: 317 / HURT: 2474
total cycles in shared programs: 862988990 -> 860573870 (-0.28%)
cycles in affected programs: 612392461 -> 609977341 (-0.39%)
helped: 7447 / HURT: 7686
total sends in shared programs: 1034763 -> 1034555 (-0.02%)
sends in affected programs: 784 -> 576 (-26.53%)
helped: 8 / HURT: 0
LOST: 56
GAINED: 143
Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20545461 -> 20545220 (<.01%)
instructions in affected programs: 422405 -> 422164 (-0.06%)
helped: 180 / HURT: 459
total cycles in shared programs: 872697345 -> 866874523 (-0.67%)
cycles in affected programs: 573117917 -> 567295095 (-1.02%)
helped: 6783 / HURT: 6980
total spills in shared programs: 4335 -> 4336 (0.02%)
spills in affected programs: 90 -> 91 (1.11%)
helped: 1 / HURT: 2
total fills in shared programs: 4194 -> 4196 (0.05%)
fills in affected programs: 463 -> 465 (0.43%)
helped: 1 / HURT: 2
total sends in shared programs: 1079446 -> 1079238 (-0.02%)
sends in affected programs: 784 -> 576 (-26.53%)
helped: 8 / HURT: 0
LOST: 117
GAINED: 37
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 209708136 -> 209695617 (-0.01%); split: -0.02%, +0.01%
Send messages: 10927753 -> 10927640 (-0.00%)
Cycle count: 30540172048 -> 30427084732 (-0.37%); split: -0.99%, +0.62%
Spill count: 511621 -> 510932 (-0.13%); split: -0.22%, +0.08%
Fill count: 621166 -> 618440 (-0.44%); split: -0.56%, +0.12%
Scratch Memory Size: 35574784 -> 35648512 (+0.21%); split: -0.06%, +0.26%
Max live registers: 65453860 -> 65453140 (-0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 75374990 -> 35195764 (-53.31%)
Totals from 503284 (71.25% of 706391) affected shaders:
Instrs: 180203778 -> 180191259 (-0.01%); split: -0.02%, +0.01%
Send messages: 9699732 -> 9699619 (-0.00%)
Cycle count: 30080349592 -> 29967262276 (-0.38%); split: -1.01%, +0.63%
Spill count: 511584 -> 510895 (-0.13%); split: -0.22%, +0.08%
Fill count: 621120 -> 618394 (-0.44%); split: -0.56%, +0.12%
Scratch Memory Size: 35443712 -> 35517440 (+0.21%); split: -0.06%, +0.27%
Max live registers: 52566092 -> 52565372 (-0.00%); split: -0.01%, +0.00%
Non SSA regs after NIR: 70110949 -> 29931723 (-57.31%)
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
Calculate the IP ranges of the shader as an analysis pass. This will
later replace the existing tracking of start_ip/end_ip as the blocks are
changed (and the defer/adjust scheme to avoid too much churn when that
happen).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34012>
The base class was used when we had vec4, but now we can fold it with
its only subclass. Declare fs_visitor now as a struct to be able to
forward declare for C code without causing errors due to class/struct
being mixed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>
This will allow simplify the OPT macro for fs_visitor::optimize().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26887>
Those are used in the failure paths and are easily retriavable from the
stage itself, so no need to store them.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25367>
Use a struct for various common parameters rather than per stage
structure or arguments to stage specific entrypoints.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23942>
Delete unnecessary virtual functions, we need just two. Refactor code
so the 'default behavior' logic (stderr and/or creating file) is not
duplicated.
Rename the virtuals so overrides don't hide the common convenience
functions. Finally, provide a variant of dump_instructions() with
a `FILE *` parameter.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23457>
This structure will contain the opcode mapping tables in the next
commit. For now, this is the mechanical change to plumb it into all
the necessary places, and it continues simply holding devinfo.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
We only have a single prog_data::total_scratch for all shader variants
(SIMD 8, 16, 32). Therefore we should always max the total_scratch on
top of existing variant.
We probably haven't run into that issue before because we compile by
increasing SIMD size and higher SIMD size is more likely to spill. But
for bindless shaders with return shaders, if the last return part
doesn't spill, we completely ignore the previous parts' scratch
computation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15193>
While our LIFO scheduling mode attempts to optimize for register
pressure, it's often hard for a scheduling algorithm to do better than
the instruction order provided by the shader author. Shader authors
often do perfectly reasonable things like using texture results
immediately after fetching them or constructing texture coordinates
immediately before the texture op. When we throw all the instruction
ordering information away, we loose any help the author may have given
us. By attempting NONE before we fall back to the worst case LIFO mode.
And, yes, I tried this with NONE both before and after LIFO and doing
NONE before LIFO is substantially better, according to shader-db.
total instructions in shared programs: 19673152 -> 19665202 (-0.04%)
instructions in affected programs: 33669 -> 25719 (-23.61%)
helped: 20
HURT: 0
helped stats (abs) min: 15 max: 4609 x̄: 397.50 x̃: 107
helped stats (rel) min: 2.33% max: 67.50% x̄: 14.60% x̃: 9.12%
95% mean confidence interval for instructions value: -867.61 72.61
95% mean confidence interval for instructions %-change: -21.74% -7.46%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 935562500 -> 935020920 (-0.06%)
cycles in affected programs: 18620349 -> 18078769 (-2.91%)
helped: 104
HURT: 48
helped stats (abs) min: 88 max: 60986 x̄: 8031.48 x̃: 3680
helped stats (rel) min: 0.61% max: 51.44% x̄: 14.95% x̃: 8.87%
HURT stats (abs) min: 10 max: 54724 x̄: 6118.62 x̃: 1530
HURT stats (rel) min: 0.13% max: 46.45% x̄: 10.28% x̃: 6.46%
95% mean confidence interval for cycles value: -5724.34 -1401.71
95% mean confidence interval for cycles %-change: -9.86% -4.10%
Cycles are helped.
total spills in shared programs: 12158 -> 10327 (-15.06%)
spills in affected programs: 1831 -> 0
helped: 20
HURT: 0
total fills in shared programs: 14749 -> 12635 (-14.33%)
fills in affected programs: 2114 -> 0
helped: 20
HURT: 0
LOST: 8
GAINED: 649
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
Use the same URB access helpers that were added for Task Output. The
Arrayed I/O (per-primitive and per-vertex) is handled by applying the
pitch from the MUE layout into the NIR intrinsics and including the
non-arrayed offset on top of it. After that, the index src can be
used directly for lowering.
Because we keep around the non-arrayed offset AND the pitch is
aligned, we can identify cases where the access is indirect but
guaranteed to be aligned, and dispatch a single message. Added a TODO
to explore that later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
Implement the output written by the task *workgroup* and available to
all the mesh *workgroups* dispatched from that task. We currently
ignore any layout annotations (since they are not really testable) and
produce a (packed) layout ourselves.
The URB messages are only SIMD8, so for larger SIMDs, the functions
will produce multiple messages. Making this lowering here instead of
the generic lower_simd_width() since it is not just a matter of
zip/unzip, e.g. the offset must be adjusted.
Indirect writes/reads are implemented by handling one component at a
time and using the PER_SLOT variant of the messages.
Note that VK_NV_mesh_shader allows reading outputs, so add support for
that as well.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
This is where it should be rather than having to pass it into the
optimisation pass every time.
It also allows us to call the loop analysis pass without having to
duplicate these options which we will do later in this series.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>
The callers already have this value, and we would like to make it
follow different rules other than stage that might not be visible to
the helper function, so just pass explicitly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9779>
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7382>
Some day we probably want to move it out of brw_shader if we're going to
share it with IBC but that can be another day.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5596>
This only does half of the work. The actual representation of the
idom tree is left untouched, but the computation algorithm is moved
into a separate analysis result class wrapped in a BRW_ANALYSIS
object, along with the intersect() and dump_domtree() auxiliary
functions in order to keep things tidy.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
The invalidate_analysis() method knows what analysis passes there are
in the back-end and calls their invalidate() method to report changes
in the IR. For the moment it just calls invalidate_live_intervals()
(which will eventually be fully replaced by this function) if anything
changed.
This makes all optimization passes invalidate DEPENDENCY_EVERYTHING,
which is clearly far from ideal -- The dependency classes passed to
invalidate_analysis() will be refined in a future commit.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This reflects the natural dependency relationship between brw_cfg.h
and brw_shader.h. brw_cfg.h only requires the base IR definitions
which are now part of a separate header.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This pulls out the i965 IR definitions into a separate file and leaves
the top-level backend_shader structure and back-end compiler entry
points in brw_shader.h. The purpose is to keep things tidy and
prevent a nasty circular dependency between brw_cfg.h and
brw_shader.h. The logical dependency between these data structures
looks like:
backend_shader (brw_shader.h) -> cfg_t (brw_cfg.h)
-> bblock_t (brw_cfg.h) -> backend_instruction (brw_shader.h)
This circular header dependency is currently resolved by using forward
declarations of cfg_t/bblock_t in brw_shader.h and having brw_cfg.h
include brw_shader.h, which seems backwards and won't work at all when
the forward declarations of cfg_t/bblock_t are no longer sufficient in
a future commit.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This method is similar to the existing ::equals methods. Instead of
testing that two src_regs are equal to each other, it tests that one is
the negation of the other.
v2: Simplify various checks based on suggestions from Matt. Use
src_reg::type instead of fixed_hw_reg.type in a check. Also suggested
by Matt.
v3: Rebase on 3 years. Fix some problems with negative_equals with VF
constants. Add fs_reg::negative_equals.
v4: Replace the existing default case with BRW_REGISTER_TYPE_UB,
BRW_REGISTER_TYPE_B, and BRW_REGISTER_TYPE_NF. Suggested by Matt.
Expand the FINISHME comment to better explain why it isn't already
finished.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3]
Reviewed-by: Matt Turner <mattst88@gmail.com>
This allows representing conditional mods and predicates on f1.0-f1.1
at the IR level by adding an extra bit to the flag_subreg
backend_instruction field.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We use the same hardware mechanism for both atomic counters and SSBO
atomics, so there's really no benefit to maintaining separate code to
handle each case. Instead, we can just use Rob's shiny new NIR pass to
convert atomic_uints to SSBOs, and delete piles of code.
The ssbo_start section of the binding table becomes a combined ABO and
SSBO section, with ABOs first, then SSBOs.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This eliminates a layer of wrapping, and makes a backend_instruction
sufficient. The downside is that it exposes 'eot' to the vec4 backend,
which it doesn't need, but can basically happily ignore.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Pallavi G <pallavi.g@intel.com>
Mostly a dummy git mv with a couple of noticable parts:
- With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
- Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
- brw_util.[ch] is not really compiler specific, so it's moved to i965.
v2:
- move brw_eu_defines.h instead of brw_defines.h
- remove no-longer applicable includes
- add missing vulkan/ prefix in the Android build (thanks Tapani)
v3:
- don't list brw_defines.h in src/intel/Makefile.sources (Jason)
- rebase on top of the oa patches
[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00
Renamed from src/mesa/drivers/dri/i965/brw_shader.h (Browse further)