Commit graph

3754 commits

Author SHA1 Message Date
Alyssa Rosenzweig
d99c2ef059 nir/opt_uniform_atomics: add fs atomics predicated? flag
on agx (and mali), we predicate atomics on "if (!helper)", so doing so again in
this pass is redundant. and would cause a problem since we'd then have to lower
the "is helper inv?" flag late. so just skip the extra lowering code.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30488>
2024-08-06 11:48:17 -04:00
Kenneth Graunke
c19e5a0a75 intel/brw: Replace predicated break optimization with a simple peephole
We can achieve most of what brw_fs_opt_predicated_break() does with
simple peepholes at NIR -> BRW conversion time.

For predicated break and continue, we can simply look at an IF ... ENDIF
sequence after emitting it.  If there's a single instruction between the
two, and it's a BREAK or CONTINUE, then we can move the predicate from
the IF onto the jump, and delete the IF/ENDIF.  Because we haven't built
the CFG at this stage, we only need to remove them from the linked list
of instructions, which is trivial to do.

For the predicated while optimization, we can rely on the fact that we
already did the predicated break optimization, and simply look for a
predicated BREAK just before the WHILE.  If so, we move the predicate
onto the WHILE, invert it, and remove the BREAK.

There are a few cases where this approach does a worse job than the old
one: nir_convert_from_ssa may introduce load_reg and store_reg in blocks
containing break, and nir_trivialize_registers may decide it needs to
insert movs into those blocks.  So, at NIR -> BRW time, we'll actually
emit some MOVs there, which might have been possible to copy propagate
out after later optimizations.

However, the fossil-db results show that it's still pretty competitive.
For instructions, 1017 shaders were helped (average -1.87 instructions),
while only 62 were hurt (average +2.19 instructions).  In affected
shaders, it was -0.08% for instructions.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>
2024-08-05 19:17:55 -07:00
Kenneth Graunke
fad63d6483 intel/brw: Delete the brw_fs_opt_dead_control_flow_eliminate() pass
With the select peephole gone, this no longer does much of anything.

No instruction changes in fossil-db on Alchemist.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>
2024-08-05 19:17:55 -07:00
Kenneth Graunke
06e8335e11 intel/brw: Delete the brw_fs_opt_peephole_select() pass
Now that we can handle load_ubo in NIR's peephole select pass, the
backend pass isn't really useful anymore.

fossil-db results on Alchemist show almost no impact:

   Totals:
   Instrs: 150646561 -> 150647106 (+0.00%); split: -0.00%, +0.00%
   Cycles: 12633748945 -> 12633760459 (+0.00%)

   Totals from 261 (0.04% of 630008) affected shaders:
   Instrs: 404946 -> 405491 (+0.13%); split: -0.00%, +0.14%
   Cycles: 23947172 -> 23958686 (+0.05%)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>
2024-08-05 19:17:55 -07:00
Kenneth Graunke
7c579f448f intel/brw: Mark all UBO access with a direct buffer index as speculative
UBO loads with a non-indirect buffer index should be safe to perform
speculatively.  With a direct offset, we may sometimes turn them into
push constants, at which point it's just reading a register with no
cost at all.  Otherwise, we access them via messages that use surface
state, and automatically perform bounds checking.  So we shouldn't have
any issues with reading out of bounds and page faulting, for example.

This allows nir_opt_peephole_sel() to operate on load_ubo intrinsics,
so we can turn simple if's with loads on both sides to bcsels.  In some
cases this can collapse a surprising amount of control flow, allowing
other optimizations to work better.

The i965 OpenGL driver used load_uniform intrinsics, which are allowed
in NIR's peephole select pass.  But iris uses the Gallium NIR pass that
translates uniforms to loads from UBO 0, so we haven't been able to take
advantage of NIR's peephole select pass there.  The backend pass was
still able to handle this to some extent, however.

fossil-db results on Alchemist:

   Totals:
   Instrs: 150656329 -> 150645307 (-0.01%); split: -0.01%, +0.00%
   Cycles: 12635230179 -> 12633696811 (-0.01%); split: -0.02%, +0.00%
   Send messages: 7416330 -> 7416261 (-0.00%)
   Spill count: 52471 -> 52473 (+0.00%)
   Fill count: 100818 -> 100803 (-0.01%); split: -0.02%, +0.00%
   Scratch Memory Size: 3197952 -> 3198976 (+0.03%)

   Totals from 1848 (0.29% of 630003) affected shaders:
   Instrs: 1412300 -> 1401278 (-0.78%); split: -0.80%, +0.02%
   Cycles: 1809789567 -> 1808256199 (-0.08%); split: -0.11%, +0.03%
   Send messages: 59829 -> 59760 (-0.12%)
   Spill count: 3870 -> 3872 (+0.05%)
   Fill count: 9693 -> 9678 (-0.15%); split: -0.18%, +0.02%
   Scratch Memory Size: 174080 -> 175104 (+0.59%)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>
2024-08-05 19:17:55 -07:00
Iván Briano
f8553f56ac intel/rt: fix terminateOnFirstHit handling
If TraceRay() is called with the TerminateOnFirstHit flag, we need to
terminate the ray on the first confirmed intersection. This is handled
by the lowering of accept_ray_intersection and it's working fine for the
case of multiple instances of the intersection shader being called.

But if the shader calls reportIntersection() more than once, we were
handling them all and accepting the closest one regardless of the flag.

Check for the flag on every confirmed intersection and, if set, accept
it right there. The subsequent lowering will take care of terminating
handling the ray termination if necessary.

Fixes new test dEQP-VK.ray_tracing_pipeline.amber.flags-accept-first

Cc: mesa-stable

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30418>
2024-08-05 21:43:36 +00:00
Caio Oliveira
ba3fd5dc57 intel/brw: Don't retype load_subgroup_invocation result to signed
The values are small unsigned integers, so their signed representation
will be the same -- the sign conversion is not needed.  As a result the
extra MOV can be elided by the optimizations.

Fossil-db results for DG2

```
Totals:
Instrs: 151779000 -> 151761591 (-0.01%)
Cycle count: 12743968649 -> 12742826024 (-0.01%); split: -0.01%, +0.00%
Max live registers: 31834993 -> 31834996 (+0.00%)

Totals from 17018 (2.70% of 631450) affected shaders:
Instrs: 2381740 -> 2364331 (-0.73%)
Cycle count: 76798588 -> 75655963 (-1.49%); split: -1.70%, +0.22%
Max live registers: 378921 -> 378924 (+0.00%)
```

and TGL

```
Totals:
Instrs: 149812033 -> 149794080 (-0.01%); split: -0.01%, +0.00%
Cycle count: 11534727002 -> 11534929834 (+0.00%); split: -0.01%, +0.01%
Spill count: 42510 -> 42511 (+0.00%); split: -0.00%, +0.01%
Fill count: 75100 -> 75101 (+0.00%); split: -0.00%, +0.00%
Max live registers: 31727318 -> 31727321 (+0.00%)

Totals from 17421 (2.76% of 630458) affected shaders:
Instrs: 3092614 -> 3074661 (-0.58%); split: -0.58%, +0.00%
Cycle count: 286061417 -> 286264249 (+0.07%); split: -0.32%, +0.39%
Spill count: 11538 -> 11539 (+0.01%); split: -0.02%, +0.03%
Fill count: 21359 -> 21360 (+0.00%); split: -0.01%, +0.01%
Max live registers: 418954 -> 418957 (+0.00%)
```

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30508>
2024-08-05 18:05:45 +00:00
Jordan Justen
58469620d3 intel/brw/validate: Convert access mask to be grf based
Our validation code doesn't need to know which bytes are accessed. It
only needs to know which grfs were accessed by an element. This also
helps to easily handle the Xe2 register size change.

Backport-to: 24.2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28479>
2024-08-02 22:18:51 +00:00
Jordan Justen
e62606b2ec intel/brw/validate: Update dst grf crossing check for Xe2
Rework:
 * Update grf_size_shift calculation (s-b Ken)

Backport-to: 24.2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28479>
2024-08-02 22:18:51 +00:00
Jordan Justen
f2800deacb intel/brw/validate: Simplify grf span validation check by not using a mask
Previously this check would create a mask of the bytes used in the
grf, and then shift the mask. This worked well when there was 32 bytes
in the register because a 64-bit uint64_t could easily detect that
bytes were used in the next regiter. (The next register was the high
32-bits of the `access_mask` variable.)

With Xe2, the register size becomes 64 bytes, meaning this strategy
doesn't work. Instead of a mask, we can just check to see if more than
1 grfs are used during each loop iteration. (Suggested by Ken.) This
will make it easier to extend for Xe2 in a follow on commit.

Verified this with
dEQP-VK.subgroups.arithmetic.compute.subgroupexclusivemul_u64vec4_requiredsubgroupsize
on Xe2, which otherwise would cause the program to fail to validate
because it assumed a grf was 32 bytes.

Backport-to: 24.2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28479>
2024-08-02 22:18:51 +00:00
Kenneth Graunke
8bca7e520c intel/brw: Only force g0's liveness to be the whole program if spilling
We don't actually need to extend g0's live range to the EOT message
generally - most messages that end a shader are headerless.  The main
implicit use of g0 is for constructing scratch headers.  With the last
two patches, we now consider scratch access that may exist in the IR
and already extend the liveness appropriately.

There is one remaining problem: spilling.  The register allocator will
create new scratch messages when spilling a register, which need to
create scratch headers, which need g0.  So, every new spill or fill
might extend the live range of g0, which would create new interference,
altering the graph.  This can be problematic.

However, when compiling SIMD16 or SIMD32 fragment shaders, we don't
allow spilling anyway.  So, why not use allow g0?  Also, when trying
various scheduling modes, we first try allocation without spilling.
If it works, great, if not, we try a (hopefully) less aggressive
schedule, and only allow spilling on the lowest-pressure schedule.

So, even for regular SIMD8 shaders, we can potentially gain the use
of g0 on the first few tries at scheduling+allocation.

Once we try to allocate with spilling, we go back to reserving g0
for the entire program, so that we can construct scratch headers at
any point.  We could possibly do better here, but this is simple and
reliable with some benefit.

Thanks to Ian Romanick for suggesting I try this approach.

fossil-db on Alchemist shows some more spill/fill improvements:

   Totals:
   Instrs: 149062395 -> 149053010 (-0.01%); split: -0.01%, +0.00%
   Cycles: 12609496913 -> 12611652181 (+0.02%); split: -0.45%, +0.47%
   Spill count: 52891 -> 52471 (-0.79%)
   Fill count: 101599 -> 100818 (-0.77%)
   Scratch Memory Size: 3292160 -> 3197952 (-2.86%)

   Totals from 416541 (66.59% of 625484) affected shaders:
   Instrs: 124058587 -> 124049202 (-0.01%); split: -0.01%, +0.01%
   Cycles: 3567164271 -> 3569319539 (+0.06%); split: -1.61%, +1.67%
   Spill count: 420 -> 0 (-inf%)
   Fill count: 781 -> 0 (-inf%)
   Scratch Memory Size: 94208 -> 0 (-inf%)

Witcher 3 shows a 33% reduction in scratch memory size, for example.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>
2024-08-01 16:37:34 -07:00
Kenneth Graunke
4ca4b064cf intel/brw: Record g0 as live for sends with send_ex_desc_scratch set
brw_send_indirect_split_message() implicitly reads g0 to construct the
extended message descriptor for certain send messages when this is set.

Record that liveness explicitly.

Thanks to Francisco Jerez for reminding me about this use of g0.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>
2024-08-01 16:37:32 -07:00
Kenneth Graunke
9200fb966c intel/brw: Record that SHADER_OPCODE_SCRATCH_HEADER uses g0
The generator code for emitting legacy scratch headers was implicitly
using g0 as a source.  But the IR wasn't indicating any usage of g0,
which means the liveness isn't properly tracked at the IR level.

It works because we reserve g0 as permanently live for the whole
program.  In order to stop doing that, we need to record it properly.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>
2024-08-01 16:37:31 -07:00
Kenneth Graunke
545f20419f intel/brw: Delete fs_reg_alloc::discard_interference_graph()
Unused since commit 50519598ff.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>
2024-08-01 16:37:28 -07:00
Sushma Venkatesh Reddy
0116430d39 intel/brw: Handle 16-bit sampler return payloads
API requires samplers to return 32-bit even though hardware can handle
16-bit floating point, so we detect that case and make more efficient
use of memory BW. This is helping improve performance of encode and
decode tokens during LLM by at least 5% across multiple platforms.

Thank you Kenneth Graunke for suggesting and guiding me throughout
this implementation.

Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30447>
2024-07-31 21:26:46 +00:00
Sushma Venkatesh Reddy
ddd9e043dc intel/brw: Move get_nir_def() higher to avoid UNDEF
While extending our backend to handle 16-bit sampler return payloads, we
found that in piglit's arb_texture_view-rendering-formats, the SIMD8 FS
was missing the sampling operation altogether. This was because we were
first emitting the texturing instruction, and then calling
get_nir_def(), which adds an UNDEF instruction when the destination is
smaller than the 32-bit. So the texturing was dead code elimated. Fix
this by calling get_nir_def() earlier.

Thank you to Kenneth Graunke for suggesting and guiding me throughout
this implementation.

Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30447>
2024-07-31 21:26:46 +00:00
Caio Oliveira
52be72e676 intel: Let compiler set indirect_ubos_use_sampler
This option is used for Gfx < 12, elk already set it to true,
so set it in brw and change the drivers to not set it anymore.

Because the dual-compiler support in Iris, the helper function
there had to change to consult the right compiler value instead.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30393>
2024-07-31 19:26:20 +00:00
Ian Romanick
fdb6afe71e intel/elk: Fix undefined left shift of negative value in elk_texture_offset
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>
2024-07-26 17:18:08 -07:00
Ian Romanick
f3f4a057b9 intel/elk: Fix undefined left shift of large UW value in elk_imm_uw
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>
2024-07-26 17:18:06 -07:00
Ian Romanick
0e5ac7d6b0 intel/elk: Fix undefined left shift of negative value in update_uip_jip
v2: Add comment and assertion to explain why the shift is
safe. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>
2024-07-26 17:18:04 -07:00
Ian Romanick
c2dda8c8e7 intel/elk: Fix undefined shift by 64 of uint64_t in elk_compute_first_urb_slot_required
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>
2024-07-26 17:18:01 -07:00
Ian Romanick
e6669467b8 intel/brw: Fix undefined left shift of negative value in brw_texture_offset
When -fsanitize=shift is used, many instances of the following are
produced:

src/intel/compiler/brw_fs_nir.cpp:114:30: runtime error: left shift of negative value -1

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>
2024-07-26 17:17:59 -07:00
Ian Romanick
4f24c2707f intel/brw: Fix undefined left shift of large UW value in brw_imm_uw
When -fsanitize=shift is used, 'ninja test' would fail in several
Intel assembly tests (mul.asm and and.asm) with:

src/intel/compiler/brw_reg.h:703:22: runtime error: left shift of 65532 by 16 places cannot be represented in type 'int'

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>
2024-07-26 17:17:56 -07:00
Ian Romanick
abb7c012ff intel/brw: Fix undefined left shift of negative value in update_uip_jip
When -fsanitize=shift is used, many instances of the following are
produced:

src/intel/compiler/brw_eu_compact.c:2244:50: runtime error: left shift of negative value -306

v2: Add comment and assertion to explain why the shift is
safe. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>
2024-07-26 17:17:53 -07:00
Ian Romanick
228e049db6 intel/brw: Fix undefined shift by 64 of uint64_t in brw_compute_first_urb_slot_required
When -fsanitize=shift is used, many instances of the following are
produced:

src/intel/compiler/brw_compiler.h:1661:44: runtime error: shift exponent 64 is too large for 64-bit type 'long long unsigned int'

I think this is an actual bug. It should check the sentinel value, but
the sentinel value is 64. The shift by 64 is treated as a shift by
0. The varying 0 is explicitly filtered by the rest of the
if-test. How does this work?

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>
2024-07-26 17:17:15 -07:00
Sushma Venkatesh Reddy
455deacbce intel/brw: Fix DEBUG_OPTIMIZER
Due to recent regression, adding INTEL_DEBUG=optimizer is dumping
shader optimization pass details to console rather than to respective
files.
Thank you, Kenneth W Graunke for helping me figure this out.

Fixes: 17b7e49089 ("intel/brw: Move out of fs_visitor and rename print instructions")

Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30389>
2024-07-26 22:22:58 +00:00
Caio Oliveira
23b0798551 intel/brw: Move interp_reg and per_primitive_reg out of fs_visitor
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
a5cc8c4807 intel/brw: Move VARYING_PULL_CONSTANT_LOAD from fs_visitor to fs_builder
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
8a39231e4f intel/brw: Move calculate_cfg out of fs_visitor
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
b98930c770 intel/brw: Move regalloc and scheduling functions out of fs_visitor
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
5cb1f46fd1 intel/brw: Remove workgroup_size() helper from fs_visitor
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
17b7e49089 intel/brw: Move out of fs_visitor and rename print instructions
They use the brw_print prefix now.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
bb7f2db5a2 intel/brw: Move printing functions to its own file
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
cdbee4156e intel/brw: Reduce scope of some MESH specific functions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
67ead4edff intel/brw: Reduce scope of some TES specific functions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
f9ddf51b70 intel/brw: Reduce scope of some TCS specific functions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
47b9dc9070 intel/brw: Reduce scope of some GS specific functions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
28858b3ad1 intel/brw: Reduce scope of some FS specific functions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
a8b4b9dd51 intel/brw: Reduce scope of some VS specific functions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
fdb029fe1b intel/brw: Move and reduce scope of run_*() functions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
c92b8a802e intel/brw: Move remaining compile stages to their own files
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Matt Turner
a3714b55f4 intel/elk: Use REG_CLASS_COUNT
Fixes: d44462c08d ("intel/elk: Fork Gfx8- compiler by copying existing code")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30314>
2024-07-25 14:55:09 +00:00
Matt Turner
5e24c21625 intel/brw: Use REG_CLASS_COUNT
Fixes: 5d87f41a54 ("intel/fs/ra: Define REG_CLASS_COUNT constant specifying the number of register classes.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30314>
2024-07-25 14:55:09 +00:00
Matt Turner
aae82061af intel/clc: Free disk_cache
Fixes: c15bf88f01 ("intel: Add a little OpenCL C compiler binary")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30313>
2024-07-24 20:46:28 +00:00
Matt Turner
1574372de4 intel/clc: Free parsed_spirv_data
This declaration shadowed a variable by the same type and name in an
outer scope. That variable is passed to clc_free_parsed_spirv().

Fixes: 4fd7495c69 ("intel/clc: add ability to output NIR")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30313>
2024-07-24 20:46:28 +00:00
Marek Olšák
b2d32ae246 nir: add nir_intrinsic_load_per_primitive_input, split from io_semantics flag
Instead of having 1 bit in nir_io_semantics indicating a per-primitive
FS input, add a dedicated intrinsic for it.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29895>
2024-07-23 16:13:16 +00:00
Kenneth Graunke
c429d5025e intel/brw: Don't force g1's live range to be the entire program
The idea here was that pixel shader framebuffer writes used the g0 and
g1 thread payload register values to construct the message header.
However, most messages are headerless and don't use either.  There's a
2012-era comment that the simulator at one point had a bug where certain
headerless messages would incorrectly take the values from the g0/g1
register contents rather than using sideband.  But, that was likely
fixed eons ago.  So we really don't need to do this.

Furthermore, there are many more shader stages these days:
- VS: r1 contains output URB handles
- TCS: r1 contains ICP handles
- TES: r1 contains gl_TessCoord.x (r4 contains output URB handles)
- GS: r1 contains output URB handles
- CS: r1 contains LocalID.X on DG2+ but nothing on older hardware
- Task/Mesh: r1 contains LocalID.X
- BS: r1 contains bindless stack handles

Vertex and geometry aren't likely to benefit here because r1 is needed
for their output messages, which are also what terminate the shader.
TES will definitely benefit because we were making a value pointlessly
live for the whole program.  Same for TCS, to a lesser extent.  Compute
prior to DG2 was the worst, as g1 literally has no meaningful content,
so there is no point to keeping it live.

fossil-db on Alchemist shows substantial spill/fill improvements:

   Totals:
   Instrs: 148782351 -> 148741996 (-0.03%); split: -0.03%, +0.01%
   Cycles: 12602907531 -> 12605795191 (+0.02%); split: -0.70%, +0.72%
   Subgroup size: 7518608 -> 7518632 (+0.00%)
   Send messages: 7341727 -> 7341762 (+0.00%)
   Spill count: 54633 -> 52575 (-3.77%)
   Fill count: 104694 -> 100680 (-3.83%)
   Scratch Memory Size: 3375104 -> 3287040 (-2.61%)

   Totals from 301172 (48.21% of 624670) affected shaders:
   Instrs: 95531927 -> 95491572 (-0.04%); split: -0.05%, +0.01%
   Cycles: 9643531593 -> 9646419253 (+0.03%); split: -0.91%, +0.94%
   Subgroup size: 4492512 -> 4492536 (+0.00%)
   Send messages: 4399737 -> 4399772 (+0.00%)
   Spill count: 20034 -> 17976 (-10.27%)
   Fill count: 41530 -> 37516 (-9.67%)
   Scratch Memory Size: 1522688 -> 1434624 (-5.78%)

Assassin's Creed Odyssey in particular has 20% fewer fills.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30146>
2024-07-23 02:26:52 +00:00
Caio Oliveira
8ba8e33c39 intel/brw: Simplify @file annotations
Doxygen documentation says

> If the file name is omitted (i.e. the line after \file is left
> blank) then the documentation block that contains the \file command will
> belong to the file it is located in.

so we can omit the filename itself when using the annotation.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30168>
2024-07-22 22:48:03 +00:00
José Roberto de Souza
de5d767f9a intel/brw: Add a maximum scratch size restriction
Gfx 12.5 moved scratch to a surface and SURFTYPE_SCRATCH has this pitch
restriction:

RENDER_SURFACE_STATE::Surface Pitch
   For surfaces of type SURFTYPE_SCRATCH, valid range of pitch is:
   [63,262143] -> [64B, 256KB]

The pitch of the surface is the scratch size per thread and the surface
should be large enough to accommodate every physical thread.

So here adding a new field to intel_device_info, setting it in
intel_device_info_init_common() so even offline tools can have it set.
And finally adding a check to fail shader compilation if needed
scratch is larger than supported.

This issue can be reproduced in debug builds when running
dEQP-VK.protected_memory.stack.stacksize_1024 on Gfx 12.5 or newer
platforms.

Ref: BSpec 43862 (r52666)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30271>
2024-07-22 18:17:38 +00:00
Francisco Jerez
b98eebbcb2 intel/brw: Implement null push constant workaround.
This implements an undocumented workaround for a hardware bug that
affects draw calls with a pixel shader that has 0 push constant cycles
when TBIMR is enabled, which has been seen to lead to a hang with
Fallout 3 and Metal Gear Rising Revengeance.  This hardware bug has
been reported as HSDES#22020184996 which is still pending a resolution
by the hardware team.  However since this workaround found empirically
has been confirmed to fix the issue reliably and it's relatively
harmless it seems worth checking in already even though no final W/A
number is available nor has the W/A json file been updated.

To avoid the issue we simply pad the push constant payload to be at
least 1 register.  This is enabled via a brw_wm_prog_key since the
driver needs to be in agreement with the compiler on whether the dummy
push constant cycle is present, and it can be avoided in cases where
the driver knows that TBIMR will be disabled (e.g. for BLORP).

Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10728
Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11399
Fixes: 57decad976 ("intel/xehp: Enable TBIMR by default.")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30031>
2024-07-20 01:13:19 +00:00