Commit graph

340 commits

Author SHA1 Message Date
Sviatoslav Peleshko
2a4efe21c5 intel/brw/gfx9: Implement WaClearArfDependenciesBeforeEot
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11928
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31746>
2024-10-23 15:02:27 +00:00
Lionel Landwerlin
97b17aa0b1 brw/nir: rework inline_data_intel to work with compute
This intrinsic was initially dedicated to mesh/task shaders, but the
mechanism it exposes also exists in the compute shaders on Gfx12.5+.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>
2024-10-17 19:35:59 +00:00
Caio Oliveira
9537b62759 intel/brw: Add SHADER_OPCODE_REDUCE
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>
2024-10-11 06:40:29 +00:00
Caio Oliveira
bf9456753d intel/brw: Validate some instructions exists only up until some phases
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>
2024-10-11 06:40:29 +00:00
Caio Oliveira
affa7567c2 intel/brw: Add phases to backend
The general idea is to be able to validate that certain instructions
were lowered and certain restrictions were already handled.  Passes can
now assert their expectations, i.e. if a pass is mean to run after
certain lowerings or not.

The actual phases are a initial stab and as we re-organized the passes,
we may remove/add phases.

This commit just add some phase steps, later commits will make use of
them.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>
2024-10-11 06:40:29 +00:00
Caio Oliveira
2811cb2923 intel: Add statistic for Non SSA registers after NIR to BRW
This is going to be useful while we convert the NIR to BRW to produce
SSA definitions.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>
2024-10-11 06:40:29 +00:00
Caio Oliveira
6db7d1af16 intel/compiler: Rename shader_stats structs
Add the `brw_` and `elk_` prefixes to the structs to avoid compilation
failure building with LTO ("violates the C++ One Definition Rule") when
the structs diverge.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>
2024-10-11 06:40:29 +00:00
Kenneth Graunke
c19e5a0a75 intel/brw: Replace predicated break optimization with a simple peephole
We can achieve most of what brw_fs_opt_predicated_break() does with
simple peepholes at NIR -> BRW conversion time.

For predicated break and continue, we can simply look at an IF ... ENDIF
sequence after emitting it.  If there's a single instruction between the
two, and it's a BREAK or CONTINUE, then we can move the predicate from
the IF onto the jump, and delete the IF/ENDIF.  Because we haven't built
the CFG at this stage, we only need to remove them from the linked list
of instructions, which is trivial to do.

For the predicated while optimization, we can rely on the fact that we
already did the predicated break optimization, and simply look for a
predicated BREAK just before the WHILE.  If so, we move the predicate
onto the WHILE, invert it, and remove the BREAK.

There are a few cases where this approach does a worse job than the old
one: nir_convert_from_ssa may introduce load_reg and store_reg in blocks
containing break, and nir_trivialize_registers may decide it needs to
insert movs into those blocks.  So, at NIR -> BRW time, we'll actually
emit some MOVs there, which might have been possible to copy propagate
out after later optimizations.

However, the fossil-db results show that it's still pretty competitive.
For instructions, 1017 shaders were helped (average -1.87 instructions),
while only 62 were hurt (average +2.19 instructions).  In affected
shaders, it was -0.08% for instructions.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>
2024-08-05 19:17:55 -07:00
Kenneth Graunke
fad63d6483 intel/brw: Delete the brw_fs_opt_dead_control_flow_eliminate() pass
With the select peephole gone, this no longer does much of anything.

No instruction changes in fossil-db on Alchemist.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>
2024-08-05 19:17:55 -07:00
Kenneth Graunke
06e8335e11 intel/brw: Delete the brw_fs_opt_peephole_select() pass
Now that we can handle load_ubo in NIR's peephole select pass, the
backend pass isn't really useful anymore.

fossil-db results on Alchemist show almost no impact:

   Totals:
   Instrs: 150646561 -> 150647106 (+0.00%); split: -0.00%, +0.00%
   Cycles: 12633748945 -> 12633760459 (+0.00%)

   Totals from 261 (0.04% of 630008) affected shaders:
   Instrs: 404946 -> 405491 (+0.13%); split: -0.00%, +0.14%
   Cycles: 23947172 -> 23958686 (+0.05%)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>
2024-08-05 19:17:55 -07:00
Kenneth Graunke
8bca7e520c intel/brw: Only force g0's liveness to be the whole program if spilling
We don't actually need to extend g0's live range to the EOT message
generally - most messages that end a shader are headerless.  The main
implicit use of g0 is for constructing scratch headers.  With the last
two patches, we now consider scratch access that may exist in the IR
and already extend the liveness appropriately.

There is one remaining problem: spilling.  The register allocator will
create new scratch messages when spilling a register, which need to
create scratch headers, which need g0.  So, every new spill or fill
might extend the live range of g0, which would create new interference,
altering the graph.  This can be problematic.

However, when compiling SIMD16 or SIMD32 fragment shaders, we don't
allow spilling anyway.  So, why not use allow g0?  Also, when trying
various scheduling modes, we first try allocation without spilling.
If it works, great, if not, we try a (hopefully) less aggressive
schedule, and only allow spilling on the lowest-pressure schedule.

So, even for regular SIMD8 shaders, we can potentially gain the use
of g0 on the first few tries at scheduling+allocation.

Once we try to allocate with spilling, we go back to reserving g0
for the entire program, so that we can construct scratch headers at
any point.  We could possibly do better here, but this is simple and
reliable with some benefit.

Thanks to Ian Romanick for suggesting I try this approach.

fossil-db on Alchemist shows some more spill/fill improvements:

   Totals:
   Instrs: 149062395 -> 149053010 (-0.01%); split: -0.01%, +0.00%
   Cycles: 12609496913 -> 12611652181 (+0.02%); split: -0.45%, +0.47%
   Spill count: 52891 -> 52471 (-0.79%)
   Fill count: 101599 -> 100818 (-0.77%)
   Scratch Memory Size: 3292160 -> 3197952 (-2.86%)

   Totals from 416541 (66.59% of 625484) affected shaders:
   Instrs: 124058587 -> 124049202 (-0.01%); split: -0.01%, +0.01%
   Cycles: 3567164271 -> 3569319539 (+0.06%); split: -1.61%, +1.67%
   Spill count: 420 -> 0 (-inf%)
   Fill count: 781 -> 0 (-inf%)
   Scratch Memory Size: 94208 -> 0 (-inf%)

Witcher 3 shows a 33% reduction in scratch memory size, for example.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>
2024-08-01 16:37:34 -07:00
Kenneth Graunke
9200fb966c intel/brw: Record that SHADER_OPCODE_SCRATCH_HEADER uses g0
The generator code for emitting legacy scratch headers was implicitly
using g0 as a source.  But the IR wasn't indicating any usage of g0,
which means the liveness isn't properly tracked at the IR level.

It works because we reserve g0 as permanently live for the whole
program.  In order to stop doing that, we need to record it properly.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>
2024-08-01 16:37:31 -07:00
Caio Oliveira
23b0798551 intel/brw: Move interp_reg and per_primitive_reg out of fs_visitor
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
a5cc8c4807 intel/brw: Move VARYING_PULL_CONSTANT_LOAD from fs_visitor to fs_builder
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
8a39231e4f intel/brw: Move calculate_cfg out of fs_visitor
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
b98930c770 intel/brw: Move regalloc and scheduling functions out of fs_visitor
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
5cb1f46fd1 intel/brw: Remove workgroup_size() helper from fs_visitor
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
17b7e49089 intel/brw: Move out of fs_visitor and rename print instructions
They use the brw_print prefix now.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
cdbee4156e intel/brw: Reduce scope of some MESH specific functions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
67ead4edff intel/brw: Reduce scope of some TES specific functions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
f9ddf51b70 intel/brw: Reduce scope of some TCS specific functions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
47b9dc9070 intel/brw: Reduce scope of some GS specific functions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
28858b3ad1 intel/brw: Reduce scope of some FS specific functions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
a8b4b9dd51 intel/brw: Reduce scope of some VS specific functions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
fdb029fe1b intel/brw: Move and reduce scope of run_*() functions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
3670c24740 intel/brw: Replace uses of fs_reg with brw_reg
And remove the fs_reg alias.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>
2024-07-03 02:53:19 +00:00
Caio Oliveira
d00329e821 intel/brw: Replace some fs_reg constructors with functions
Create three helper functions for ATTR, UNIFORM and VGRF creation.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>
2024-07-03 02:53:18 +00:00
Sagar Ghuge
99ce8b5a07 intel/compiler: Add indirect mov lowering pass
Indirect addressing(vx1 and vxh) not supported with UB/B datatype for
src0, so we need to change the data type for both dest and src0.

This fixes following tests cases on Xe2+
 - dEQP-VK.spirv_assembly.instruction.compute.8bit_storage.push_constant_8_to_16*
 - dEQP-VK.spirv_assembly.instruction.compute.8bit_storage.push_constant_8_to_32*

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29316>
2024-07-01 19:06:31 +00:00
Kenneth Graunke
1e69ec3b8d intel/brw: Add a lower_csel pass and allow building it for all types
We can do CSEL on F, HF, *W, and *D on Gfx11+.  Gfx9 can only do F.

We can lower unsupported types to CMP+CSEL, allowing us to use CSEL
in the IR and not worry about the limitations.

Rework: (Sagar)
- Update validation pass for CSEL

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29316>
2024-07-01 19:06:31 +00:00
Francisco Jerez
95eec5a0dd intel/fs/xe2+: Add ALU-based implementation of barycentric interpolation at a per-channel offset.
This implements a replacement for the previous implementation of
nir_intrinsic_load_barycentric_at_offset that relied on the Pixel
Interpolator shared function, since it's going to be removed from the
hardware from Xe2 onwards.

That's okay since we can get all the primitive setup information
needed for interpolation at an arbitrary coordinate: We use the X/Y
offset relative to the "X/Y Start" coordinates from the thread payload
order to evaluate the plane equations also provided in the thread
payload for each barycentric coordinate of each polygon.  The
evaluation of the barycentric plane equations (and the RHW plane
equation for perspective-correct interpolation) uses the accumulator
and MAD/MAC for ALU efficiency, but that means we need to manually
split instructions to fit the width of the accumulator.  The division
and scaling for perspective-correct interpolation is also now done in
the shader if necessary.

Note that even though this is only immediately useful on Xe2+, the
thread payload numbers are filled out for older platforms, and the EU
restrictions of previous Xe platforms are taken into account, mostly
for the purposes of testing and performance evaluation.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>
2024-06-27 00:18:00 +00:00
Caio Oliveira
b59ea3d63f intel/brw: Print SWSB information when dumping instructions
These were only being shown before as part of disassemble.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29738>
2024-06-23 08:09:56 -07:00
Kenneth Graunke
580e1c592d intel/brw: Introduce a new SSA-based copy propagation pass
(Quite a few of the restrictions here are ported from the old pass.)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
9690bd369d intel/brw: Delete old local common subexpression elimination pass
We no longer use this older pass, so there's no need to keep it.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
234c45c929 intel/brw: Write a new global CSE pass that works on defs
This has a number of advantages compared to the pass I wrote years ago:

- It can easily perform either Global CSE or block-local CSE, without
  needing to roll any dataflow analysis, thanks to SSA def analysis.
  This global CSE is able to detect and coalesce memory loads across
  blocks.  Although it may increase spilling a little, the reduction
  in memory loads seems to more than compensate.

- Because SSA guarantees that values are never written more than once,
  the new CSE pass can directly reuse an existing value.  The old pass
  emitted copies at the point where it discovered a value because it
  had no idea whether it'd be mutated later.  This led it to generate
  a ton of trash for copy propagation to clean up later, and also a
  nasty fragility where CSE, register coalescing, and copy propagation
  could all fight one another by generating and cleaning up copies,
  leading to infinite optimization loops unless we were really careful.
  Generating less trash improves our CPU efficiency.

- It uses hash tables like nir_instr_set and nir_opt_cse, instead of
  linearly walking lists and comparing each element.  This is much more
  CPU efficient.

- It doesn't use liveness analysis, which is one of the most expensive
  analysis passes that we have.  Def analysis is cheaper.

In addition to CSE'ing SSA values, we continue to handle flag writes,
as this is a huge source of CSE'able values.  These remain block local.
However, we can simply track the last flag write, rather than creating
entire sets of instruction entries like the old pass.  Much simpler.

The only real downside to this pass is that, because the backend is
currently only partially SSA, it has limited visibility and isn't able
to see all values.  However, the results appear to be good enough that
the new pass can effectively replace the old pass in almost all cases.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
2b30b3bbd4 intel/brw: Print defs in dump_instructions
Like NIR, we print SSA defs as %1, %2, and so on.  The number here is
the VGRF number.  VGRFs that don't correspond to a SSA def remain
printed as vgrf1, vgrf2, and so on.

This makes it much easier to see what values are SSA and which aren't.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Caio Oliveira
08da7edc0e intel/brw: Track the number of uses of each def in def_analysis
Even without a full use list, simply tracking the number of uses will
let us tell "this is the only use of the def" or "we've just replaced
all uses of a def".  It's inexpensive to calculate and will be useful.

(rebased by Kenneth Graunke)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
0d144821f0 intel/brw: Add a new def analysis pass
This introduces a new analysis pass that opportunistically looks for
VGRFs which happen to satisfy the SSA definition properties.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
84219892ad intel/brw: Make gl_SubgroupInvocation lane index loading SSA
Our code to initialize gl_SubgroupInvocation uses multiple instructions
some of which are partial writes.  This makes it difficult to analyze
expressions involving gl_SubgroupInvocation, which appear very
frequently in compute shaders.

To make this easier, we add a new virtual opcode which initializes
a full VGRF to the value of gl_SubgroupInvocation.  (We also expand
it to UD for SIMD8 so there are not partial write issues.)  We then
lower it to the original code later on in compilation, after we've
done the bulk of our optimizations.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Lionel Landwerlin
1bbe2d9833 intel/brw: fixup wm_prog_data_barycentric_modes()
Always select sample barycentric when persample dispatch is unknown at
compile time and let the payload adjustments feed the expected value
based on dispatch.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27803>
2024-04-26 05:13:02 +00:00
Kenneth Graunke
f523bfcf90 intel/brw: Reindent after shortening BRW_REGISTER_TYPE_* to BRW_TYPE_*
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
873fcdff38 intel/brw: Stop using long BRW_REGISTER_TYPE enum names
s/BRW_REGISTER_TYPE/BRW_TYPE/g

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Caio Oliveira
ff89e83178 intel/brw: Lower VGRFs to FIXED_GRFs earlier
Moves the lowering of VGRFs into FIXED_GRFs from the code generation
to (almost) right after the register allocation.

This will allow: (1) later passes not worry about VGRFs (and what they
mean in a post reg alloc phase) and (2) make easier to add certain
types of validation post reg alloc phase using the backend IR.

Note that a couple of passes still take advantage of seeing "allocated
VGRFs", so perform lowering after they run.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28604>
2024-04-23 23:17:57 +00:00
Caio Oliveira
13093ceb3c intel/brw: Move validate out of fs_visitor
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28534>
2024-04-22 13:38:41 -07:00
Kenneth Graunke
d5b8cec7a2 intel/brw: Replace FS_OPCODE_LINTERP with BRW_OPCODE_PLN
We no longer support the old LINE+MAC lowering, and we already lower
this to MAD in NIR on Gfx11+, so the LINTERP virtual opcode always
corresponds the PLN.  The only catch is that LINTERP's operands are
reversed from PLN, so we have to switch them.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
2024-04-16 02:14:49 +00:00
Kenneth Graunke
12b0e03bd2 intel/brw: Use SHADER_OPCODE_SEND for coherent framebuffer reads
We already have a logical opcode and lower to what is basically a send
instruction.  We just weren't using SHADER_OPCODE_SEND, instead having
extra redundant infrastructure for no real gain.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
2024-04-16 02:14:49 +00:00
Kenneth Graunke
217d56e9b1 intel/brw: Delete fs_visitor::vgrf helper
Just use fs_builder::vgrf instead of the older glsl_type-based one.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
2024-04-16 02:14:49 +00:00
Francisco Jerez
6427f16074 intel/brw/gfx12: Setup PS thread payload registers required for ALU-based pixel interpolation.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28306>
2024-03-20 15:46:44 -07:00
Kenneth Graunke
ea423aba1b intel/brw: Split out 64-bit lowering from algebraic optimizations
We don't necessarily want to split up MOVs for 64-bit addresses into
2x 32-bit MOVs right away, as this makes things like copy propagating
the whole address around harder.  We should do this late, once, while
still doing other algebraic optimizations earlier.

fossil-db results for Alchemist show tiny improvements:

   Totals:
   Instrs: 161310502 -> 161310436 (-0.00%); split: -0.00%, +0.00%
   Cycles: 14370605606 -> 14370605159 (-0.00%); split: -0.00%, +0.00%

   Totals from 33 (0.01% of 652298) affected shaders:
   Instrs: 15053 -> 14987 (-0.44%); split: -0.64%, +0.20%
   Cycles: 196947 -> 196500 (-0.23%); split: -0.25%, +0.02%

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28286>
2024-03-20 01:04:17 -07:00
Kenneth Graunke
97bf3d3b2d intel/brw: Replace CS_OPCODE_CS_TERMINATE with SHADER_OPCODE_SEND
There's no need for special handling here, it's just a send message
with a trivial g0 header and descriptor.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27924>
2024-03-05 11:16:20 +00:00
Rohan Garg
73d98848fa intel/compiler: Xe2+ can do URB load/store with a byte offset
Thanks to Ken for suggesting this URB refactoring change and pointing
out that the LSC can operate on the byte offset granularity.

This should fix the geometry shader test cases where we have more than
32 vertices since previously we were failing to write the correct
control data bits because of incorrect write mask.

Shader-db results for Xe2:

total instructions in shared programs: 153475 -> 153437 (-0.02%)
instructions in affected programs: 1374 -> 1336 (-2.77%)
helped: 11
HURT: 0
helped stats (abs) min: 3 max: 5 x̄: 3.45 x̃: 3
helped stats (rel) min: 1.67% max: 4.92% x̄: 3.23% x̃: 2.70%
95% mean confidence interval for instructions value: -3.92 -2.99
95% mean confidence interval for instructions %-change: -4.10% -2.36%
Instructions are helped.

total loops in shared programs: 140 -> 140 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total cycles in shared programs: 16002649 -> 16002329 (<.01%)
cycles in affected programs: 9174 -> 8854 (-3.49%)
helped: 11
HURT: 0
helped stats (abs) min: 22 max: 38 x̄: 29.09 x̃: 32
helped stats (rel) min: 2.62% max: 5.54% x̄: 3.78% x̃: 3.85%
95% mean confidence interval for cycles value: -33.56 -24.62
95% mean confidence interval for cycles %-change: -4.48% -3.08%
Cycles are helped.

total spills in shared programs: 52 -> 52 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0

total fills in shared programs: 94 -> 94 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0

total sends in shared programs: 4240 -> 4240 (0.00%)
sends in affected programs: 0 -> 0
helped: 0
HURT: 0

LOST:   0
GAINED: 0

Rework: (Sagar)
- Adjust offset/indirect offset calculation.
- Add shader-db results
- Always calculate dword index
- Drop changes for indirect writes

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27602>
2024-03-01 16:11:30 +00:00