v2: Add brw_ir_performance.cpp and brw_fs_generator.cpp changes. Fix
overlapping register allocation (via has_source_and_destination_hazard). Fix
incorrect destination register file encoding.
v3: Prevent lower_regioning from trying to "fix" DPAS sources.
v4: Add instruction latency information for scheduling and perf
estimates.
v5: Remove all mention of DPASW. Suggested by Curro and Caio. Update
the comment in fs_inst::has_source_and_destination_hazard. Suggested
by Caio.
v6: Add some comments near the src2 calculation in
fs_inst::size_read. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
The current name doesn't cover all the tex related instructions and
in all usages, we already have a switch statement to dispatch
per instruction type, so is more natural to list the instructions we
care there.
In fs::is_send_from_grf() we can simply ignore them since the
instructions are either lowered directly to SEND (Gfx7+) or use
MRF (Gfx6-).
With this change, the fs_inst::size_read() generated code gets
simplified (the "tex" entries get added to the switch jump table
in gcc) and the default case loses the conditional handling tex.
This reduces shader compilation time, as illustrated by replaying
fossils (tested on my TGL laptop):
```
// Rise of the Tomb Raider (N=13)
Difference at 95.0% confidence
-1.32231 +/- 0.0170138
-4.37605% +/- 0.0563054%
(Student's t, pooled s = 0.0210159)
// Cyberpunk 2077 (N=7)
Difference at 95.0% confidence
-3.64 +/- 0.114993
-2.95188% +/- 0.0932544%
(Student's t, pooled s = 0.09873)
```
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25721>
Purely from the backend point of view it's just an additional
parameter to sampler messages.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23882>
Gives use 4Gb of bindless surface state on Gfx12.5+ instead of 64Mb.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
The version that takes a list of instructions is not used. I did not do
any archaeology to find out when the last user was removed.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
Those SENDs are still doing a full register write. We just inserted
some predication for a workaround.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
v2: drop the hardcoded inst->mlen=1 (Rohan)
v3: Move back to LOAD/STORE messages (limited to SIMD16 for LSC)
v4: Also use 4 GRFs transpose loads for fills (Curro)
v5: Reduce amount of needed register to build per lane offsets (Curro)
Drop some now useless SIMD32 code
Unify unspill code
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17555>
This structure will contain the opcode mapping tables in the next
commit. For now, this is the mechanical change to plumb it into all
the necessary places, and it continues simply holding devinfo.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
Previously can_do_source_mods was used to determine whether a value with
a source modifier or a value from a scalar source (e.g., a uniform)
could be copy propagated. The former is a superset of the latter, so
this always produces correct results, but it is overly restrictive. For
example, a BFI instruction can't have source modifiers, but it can have
scalar sources.
This was originally authored to prevent a small number of shader-db
regressions in a commit that marked SHR has not being able to have
source modifiers. That commit has since been dropped in favor of a
different method.
v2: Refactor register region restriction detection to a helper function.
Suggested by Jason.
No fossil-db changes on any Intel platform.
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20039111 -> 20038943 (<.01%)
instructions in affected programs: 31736 -> 31568 (-0.53%)
helped: 104
HURT: 0
helped stats (abs) min: 1 max: 9 x̄: 1.62 x̃: 1
helped stats (rel) min: 0.30% max: 0.88% x̄: 0.45% x̃: 0.42%
95% mean confidence interval for instructions value: -2.03 -1.20
95% mean confidence interval for instructions %-change: -0.47% -0.42%
Instructions are helped.
total cycles in shared programs: 980309750 -> 980308897 (<.01%)
cycles in affected programs: 591078 -> 590225 (-0.14%)
helped: 70
HURT: 26
helped stats (abs) min: 2 max: 622 x̄: 23.94 x̃: 4
helped stats (rel) min: <.01% max: 2.85% x̄: 0.33% x̃: 0.12%
HURT stats (abs) min: 2 max: 520 x̄: 31.65 x̃: 6
HURT stats (rel) min: 0.02% max: 2.45% x̄: 0.34% x̃: 0.15%
95% mean confidence interval for cycles value: -26.41 8.64
95% mean confidence interval for cycles %-change: -0.27% -0.03%
Inconclusive result (value mean confidence interval includes 0).
No shader-db changes on earlier Intel platforms.
Reviewed-by: Anuj Phogat anuj.phogat@gmail.com [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9237>
I meant to do this years ago when I first added SHADER_OPCODE_SEND. At
the time, the only use for the extended descriptor was bindless handles
which were always one thing and never non-constant. However, it doesn't
actually require any extra instructions because we have to OR in ex_mlen
anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8748>
This pulls out the i965 IR definitions into a separate file and leaves
the top-level backend_shader structure and back-end compiler entry
points in brw_shader.h. The purpose is to keep things tidy and
prevent a nasty circular dependency between brw_cfg.h and
brw_shader.h. The logical dependency between these data structures
looks like:
backend_shader (brw_shader.h) -> cfg_t (brw_cfg.h)
-> bblock_t (brw_cfg.h) -> backend_instruction (brw_shader.h)
This circular header dependency is currently resolved by using forward
declarations of cfg_t/bblock_t in brw_shader.h and having brw_cfg.h
include brw_shader.h, which seems backwards and won't work at all when
the forward declarations of cfg_t/bblock_t are no longer sufficient in
a future commit.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>