Commit graph

113 commits

Author SHA1 Message Date
Caio Oliveira
e0614e8ea1 intel/brw: Use brw_analysis prefix for liveness analysis files
Move declaration to the common header and rename definition file.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33048>
2025-02-05 21:47:06 +00:00
Caio Oliveira
d59bd421a2 intel/brw: Rename fs_inst to brw_inst
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33114>
2025-01-31 00:57:21 +00:00
Caio Oliveira
a4afb81729 intel/brw: Use brw prefix for some schedule instructions identifiers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33145>
2025-01-27 18:32:41 +00:00
Lionel Landwerlin
aac906c16c brw: add scheduler support for address registers
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>
2025-01-11 08:41:42 +00:00
Caio Oliveira
4d43ee0dd6 intel/brw: Remove uses of VLAs
Was causing trouble in some build configurations, we don't really need
them.  Use ralloc for consistency.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Antonio Ospite <None>
Reviewed-by: Kenneth Graunke <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32916>
2025-01-10 07:05:35 +00:00
Ian Romanick
ef3dc401da brw: Add devinfo parameter to fs_inst::regs_read
This isn't used now, but future commits will add uses. Doing this as a
separate commit removes a lot of "just typing" churn from commits that
have real changes to review.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00
Kenneth Graunke
8bca7e520c intel/brw: Only force g0's liveness to be the whole program if spilling
We don't actually need to extend g0's live range to the EOT message
generally - most messages that end a shader are headerless.  The main
implicit use of g0 is for constructing scratch headers.  With the last
two patches, we now consider scratch access that may exist in the IR
and already extend the liveness appropriately.

There is one remaining problem: spilling.  The register allocator will
create new scratch messages when spilling a register, which need to
create scratch headers, which need g0.  So, every new spill or fill
might extend the live range of g0, which would create new interference,
altering the graph.  This can be problematic.

However, when compiling SIMD16 or SIMD32 fragment shaders, we don't
allow spilling anyway.  So, why not use allow g0?  Also, when trying
various scheduling modes, we first try allocation without spilling.
If it works, great, if not, we try a (hopefully) less aggressive
schedule, and only allow spilling on the lowest-pressure schedule.

So, even for regular SIMD8 shaders, we can potentially gain the use
of g0 on the first few tries at scheduling+allocation.

Once we try to allocate with spilling, we go back to reserving g0
for the entire program, so that we can construct scratch headers at
any point.  We could possibly do better here, but this is simple and
reliable with some benefit.

Thanks to Ian Romanick for suggesting I try this approach.

fossil-db on Alchemist shows some more spill/fill improvements:

   Totals:
   Instrs: 149062395 -> 149053010 (-0.01%); split: -0.01%, +0.00%
   Cycles: 12609496913 -> 12611652181 (+0.02%); split: -0.45%, +0.47%
   Spill count: 52891 -> 52471 (-0.79%)
   Fill count: 101599 -> 100818 (-0.77%)
   Scratch Memory Size: 3292160 -> 3197952 (-2.86%)

   Totals from 416541 (66.59% of 625484) affected shaders:
   Instrs: 124058587 -> 124049202 (-0.01%); split: -0.01%, +0.01%
   Cycles: 3567164271 -> 3569319539 (+0.06%); split: -1.61%, +1.67%
   Spill count: 420 -> 0 (-inf%)
   Fill count: 781 -> 0 (-inf%)
   Scratch Memory Size: 94208 -> 0 (-inf%)

Witcher 3 shows a 33% reduction in scratch memory size, for example.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>
2024-08-01 16:37:34 -07:00
Caio Oliveira
b98930c770 intel/brw: Move regalloc and scheduling functions out of fs_visitor
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
17b7e49089 intel/brw: Move out of fs_visitor and rename print instructions
They use the brw_print prefix now.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
8ba8e33c39 intel/brw: Simplify @file annotations
Doxygen documentation says

> If the file name is omitted (i.e. the line after \file is left
> blank) then the documentation block that contains the \file command will
> belong to the file it is located in.

so we can omit the filename itself when using the annotation.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30168>
2024-07-22 22:48:03 +00:00
Caio Oliveira
3670c24740 intel/brw: Replace uses of fs_reg with brw_reg
And remove the fs_reg alias.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>
2024-07-03 02:53:19 +00:00
Lionel Landwerlin
ac03cefb28 brw: limit dependencies on SR register
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>
2024-05-31 20:22:27 +00:00
Lionel Landwerlin
d8b78924c5 brw: use a single virtual opcode to read ARF registers
In 2c65d90bc8 I forgot to add the new SHADER_OPCODE_READ_MASK_REG
opcode to the list of barrier instruction in the scheduler. Let's just
use a single opcode for all ARF registers that need special
scoreboarding and put the register as source (nicer for the debug
output).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2c65d90bc8 ("intel/brw: ensure find_live_channel don't access arch register without sync")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>
2024-05-31 20:22:27 +00:00
Lionel Landwerlin
6a7e576017 intel/fs: fixup instruction scheduling last grf write tracking
When I bumped the max size of VGRFs, I should have bumped the values
in the scheduler too.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d33aff783d ("intel/fs: add support for sparse accesses")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28188>
2024-04-05 19:46:40 +00:00
Kenneth Graunke
97bf3d3b2d intel/brw: Replace CS_OPCODE_CS_TERMINATE with SHADER_OPCODE_SEND
There's no need for special handling here, it's just a send message
with a trivial g0 header and descriptor.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27924>
2024-03-05 11:16:20 +00:00
Kenneth Graunke
ad37622a8f intel/brw: Delete legacy texture opcodes
We first generate the logical opcodes, and these days fully lower to
SHADER_OPCODE_SEND.  In the past, we lowered to a non-logical variant
and handled that in the generator.  These days, we were just using the
non-logical opcodes as an awkward intermediate opcode change during
the lowering...which isn't really necessary at all.

This patch eliminates them by using the original logical opcodes.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27908>
2024-03-01 22:19:51 +00:00
Caio Oliveira
865ef36609 intel/brw: Remove brw_shader.h
Find a better home for its existing content.  Some functions are
now just static functions at the usage sites.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>
2024-02-29 19:28:06 +00:00
Caio Oliveira
1e3fbb1afe intel/brw: Fold fs_instruction_scheduler into instruction_scheduler
And use fs_inst instead of backend_instruction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>
2024-02-29 19:28:05 +00:00
Caio Oliveira
559d94cd0d intel/brw: Use fs_visitor instead of backend_shader in various passes
And since we are touching them, rename a couple of passes
to follow same name convention as existing ones.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>
2024-02-29 19:28:05 +00:00
Caio Oliveira
8f3c52c1da intel/brw: Remove MRF type
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
5c93a0e125 intel/brw: Remove Gfx8- remaining opcodes
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
85eb672325 intel/brw: Remove Gfx8- code from scheduler
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:38 +00:00
Caio Oliveira
c621f75e7b intel/brw: Remove now unused vec4-only opcodes
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:38 +00:00
Caio Oliveira
a641aa294e intel/brw: Remove vec4 backend
It still exists as part of ELK for older gfx versions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:37 +00:00
Ian Romanick
8fb37ef985 intel/fs: Add fast path for ballot(true)
This doesn't help very much now. A later commit adds a NIR optimization
pass, tentatively called nir_opt_uniform_subgroup, that converts many
kinds of subgroup operations to things involving
bitCount(ballot(true)). This commit makes a huge difference in the
results of that later commit.

No shader-db changes on any Intel platform.

Fossil-db results:

All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165558033 -> 165557519 (-0.00%)
Cycles: 15156188362 -> 15156178922 (-0.00%); split: -0.00%, +0.00%

Totals from 299 (0.05% of 656117) affected shaders:
Instrs: 88293 -> 87779 (-0.58%)
Cycles: 3709498 -> 3700058 (-0.25%); split: -0.28%, +0.03%

v2: Rebase on splitting ELK from BRW. Remove devinfo->ver >= 8 check.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
2024-02-27 08:37:46 -08:00
Dave Airlie
8f73cc802c intel/compiler: revert part of "Move earlier scheduler code that is not mode-specific"
This removed a bunch of calls from the vec4 code that aren't called anywhere else.

Bring back the bits that were removed.

Fixes glxgears on gen5

Fixes: 81594d0db1 ("intel/compiler: Move earlier scheduler code that is not mode-specific")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26862>
2024-01-04 00:38:38 +00:00
Ian Romanick
e666872c75 intel/compiler: Initial bits for DPAS instruction
v2: Add brw_ir_performance.cpp and brw_fs_generator.cpp changes. Fix
overlapping register allocation (via has_source_and_destination_hazard). Fix
incorrect destination register file encoding.

v3: Prevent lower_regioning from trying to "fix" DPAS sources.

v4: Add instruction latency information for scheduling and perf
estimates.

v5: Remove all mention of DPASW. Suggested by Curro and Caio. Update
the comment in fs_inst::has_source_and_destination_hazard. Suggested
by Caio.

v6: Add some comments near the src2 calculation in
fs_inst::size_read. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:16 -08:00
Caio Oliveira
dcb68de656 intel/compiler: Clear up block instructions before re-adding them
Avoids fixing up list pointers that we don't care about anymore -- since
all the instructions will be re-added in a different order anyway.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
a9f95bf687 intel/compiler: Reuse same scheduler for all pre-RA scheduling modes
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
0dd5378ffe intel/compiler: Make scheduler classes take an external mem_ctx
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
04aa2df461 intel/compiler: Separate schedule_node temporary data
Some fields in schedule_node will need to be reset each time they are
used.  The `cand_generation` needs to be back to zero, and both
`unblocked_time` and `parent_count` need to be back to their initial
values, which were pre-calculated.

Rename the initial data fields and add new ones for the temporary data.

Note the helper function is `per node` to allow it "tag along" with an
existing loops.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
81594d0db1 intel/compiler: Move earlier scheduler code that is not mode-specific
This will be useful later on when we reuse the same scheduler for
multiple modes.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
73d4e4118a intel/compiler: Tidy up code in scheduler related to reads_remaining
- Just assert in functions we expect it to exist
- Predicate usage with `!post_reg_alloc` to avoid suggest there are more
  combinations.
- Reuse an existing loop to call the count function.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
4f246cf4e7 intel/compiler: Merge child/latency arrays in schedule_node
Values are used together, saves one pointer in schedule_node,
reduces amount of reallocations when children count grows.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
e59a054203 intel/compiler: Move FS specific fields to fs_instruction_scheduler
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
a6297d05ca intel/compiler: Remove virtual calls from scheduler
Pull run() and schedule_instructions() for fs, and pull a very
simplified version of those into a run() for vec4.  Because of the
previous patches the duplication is small.

Since we are touching these, change run() implementations to use the
cfg from the existing reference to the visitor/shader instead of taking
one as argument.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
d76d58cf50 intel/compiler: Cache issue_time information
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
ecd7ffcf78 intel/compiler: Extract scheduling related basic functions
Those will be used in multiple places later.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
8a8dd2db0c intel/compiler: Add only available instructions to scheduling list
The list was used for iterating through all instructions and then
later also to track the available ones.  Now that the array iteration
is used, change how we fill it and rename it to reflect its only job.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
ddff6428c5 intel/compiler: Use array to iterate the scheduler nodes
For all the preparation data collection before the scheduling
actually happens, it is possible to walk the schedule nodes
in order by iterating on the range of the array dedicated to
a given block.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
fe6ac5a184 intel/compiler: Allocate all schedule_nodes at once
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
be012055da intel/compiler: Remove reference to brw_isa_info from schedule_node
It is always the same for all nodes, so use the one available in the
scheduler itself.

Also, per Matt's suggestion, collect is_haswell from devinfo instead of
from a function argument.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Caio Oliveira
6987571737 intel/compiler: Use linear allocator in parts of brw_schedule_instructions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Francisco Jerez
80e9031b44 intel/fs/xe2+: Fix grf_count in post-RA scheduling for updated register file size.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Kenneth Graunke
7eba19245d intel/compiler: Move SCHEDULE_NONE handling into schedule_instructions()
I'm going to introduce another call site for this function, and just
handling SCHEDULE_NONE in the scheduler itself makes more sense than
duplicating the logic.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
2023-08-23 21:34:38 +00:00
Emma Anholt
10b94772d2 intel: Reduce cost of resetting last_grf_write.
In zink-on-anv fs-mod-dvec3-dvec3.shader_test, we were memsetting 2MB of
last_grf_write 2400 times, multiple times through the scheduler.  Just
resetting for the processed instructions reduces runtime from 21s to 16s.
No change on steam shader-db runtime across several runs.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>
2023-06-14 16:16:56 +00:00
Emma Anholt
7d4769e802 intel: Allocate the last_grf_write once per scheduler.
No need to re-calloc it per block when we're going to use it again.  Also,
this fixes the vec4 backend to avoid allocating giant grf_count-sized
arrays on the stack.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>
2023-06-14 16:16:56 +00:00
Emma Anholt
2ad865b219 intel: Count reads_remaining across all blocks.
We were zeroing it out per block, but it doesn't actually help to count
per block, since the question is "will scheduling this instruction free
the reg?".  Saves some memsetting, which was showing up high in the
profile (but not from this source).

No change on iris SKL shader-db.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>
2023-06-14 16:16:55 +00:00
Lionel Landwerlin
a66944dfbc intel/fs: reuse descriptor helper
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:36 +00:00
Lionel Landwerlin
9471ffa70a intel/fs: fix scheduling of HALT instructions
With the following test :

dEQP-VK.spirv_assembly.instruction.terminate_invocation.terminate.no_out_of_bounds_load

There is a :

shader_start:
   ...                                 <- no control flow
   g0 = some_alu
   g1 = fbl
   g2 = broadcast g3, g1
   g4 = get_buffer_size g2
   ...                                 <- no control flow
   halt                                <- on some lanes
   g5 = send <surface>, g4

eliminate_find_live_channel will remove the fbl/broadcast because it
assumes lane0 is active at get_buffer_size :

shader_start:
   ...                                 <- no control flow
   g0 = some_alu
   g4 = get_buffer_size g0
   ...                                 <- no control flow
   halt                                <- on some lanes
   g5 = send <surface>, g4

But then the instruction scheduler will move the get_buffer_size after
the halt :

shader_start:
   ...                                 <- no control flow
   halt                                <- on some lanes
   g0 = some_alu
   g4 = get_buffer_size g0
   g5 = send <surface>, g4

get_buffer_size pulls the surface index from lane0 in g0 which could
have been turned off by the halt and we end up accessing an invalid
surface handle.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20765>
2023-05-05 00:43:25 +03:00