Commit graph

3539 commits

Author SHA1 Message Date
Francisco Jerez
c1feccdd90 intel/fs/gfx20+: Fix surface state address on extended descriptors for NIR scratch intrinsics.
The r0.5 thread payload register contains Surface State Offset bits
[27:6] as bits [31:10], so we need to shift the register right by 4 in
order to get the surface state offset expected in ExBSO mode, which is
the only extended descriptor encoding supported by the UGM shared
function for SS addressing on Xe2+.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29543>
2024-06-21 01:49:43 +00:00
Kenneth Graunke
50519598ff intel/brw: Skip discarding the interference graph
We no longer need to reserve registers for constructing spill/fill
messages.  We have split sends and construct message headers in new
temporary registers with a very short lifespan which are simply added
to the existing interference graph as new nodes and allocated via the
normal mechanism.

This means that when we need to spill for the first time, we can avoid
discarding and recomputing the entire interference graph.  We also avoid
needing to recreate all spill candidate information once ra_allocate()
fails, because the graph remains valid, and none of the existing nodes
had any changes to their interference.  The existing spill candidates
remain valid.

This will slightly help improve compile time when needing to spill.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25811>
2024-06-20 09:47:18 +00:00
Kenneth Graunke
29d6264627 intel/brw: Build the scratch header on the fly for pre-LSC systems
Instead of reserving a register to contain the spill header, which
gets marked live for the entire program, we can just emit the ALU
instructions to build it on the fly.  (This is similar to the way
we handle scratch on Alchemist with the newer LSC data port.)

There are a couple of downsides that make this not obviously a win.
First, in order to construct the scratch header on Gfx9-12, we have
to use fields from g0, which will have to remain live anywhere that
scratch access is required.  This could negate the register pressure
benefits of creating the header on the fly.  However, g0 is oft used
in other places anyway, so it may already be there.  Another is that
it's a non-trivial number of ALU instructions to construct the value.
Still, trading lower pressure (so fewer spills, less memory access
and stalls) for more cheap ALU seems like it ought to be a win.

There is another valuable benefit: by not reserving a register, we
eliminate the need to reconstruct the interference graph.  (The next
patch will actually do so.)

shader-db on Icelake shows spills/fills at 54/53 helped, 4/10 hurt,
and an 8% increase in ALU on affected shaders.  Synmark's OglCSDof
(a benchmark that spills) performance remains the same on Alderlake.

fossil-db on Icelake shows a 5.6%/5.1% reduction in spills/fills and a
4% reduction in scratch memory size on affected shaders.  Instruction
counts go up by 11.07%, but cycle estimates only increase by 0.57%.
Assassin's Creed Odyssey and Wolfenstein Youngblood both see 20-30%
reductions in spills/fills, a significant improvement.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25811>
2024-06-20 09:47:18 +00:00
Caio Oliveira
2a9f4618c5 intel/brw: Make component_size() consistent between VGRF and FIXED_GRF
Change so the size rounds up to the next multiple of the horizontal stride like
is done for VGRF.  This was causing an inconsistency in regs_read() -- The original
component_size() calculation for FIXED_GRF excluded any padding at the end but it was
still being discounted by regs_read().

Suggested by Curro.

Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11069
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29736>
2024-06-19 01:33:58 +00:00
Caio Oliveira
8fb70f0746 intel/brw: Add unit tests for scoreboard handling FIXED_GRF with stride
Based on shaders reported in
https://gitlab.freedesktop.org/mesa/mesa/-/issues/11069 and
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29723.  These
currently fail, later patch will enable them.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29736>
2024-06-19 01:33:58 +00:00
Caio Oliveira
f982d2bb79 intel/brw: Fix typo in DPAS emission code
The enums were mixed up.  Code was working because they were being
used only for their numerical values.

Fixes: e666872c75 ("intel/compiler: Initial bits for DPAS instruction")
Acked-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29762>
2024-06-18 18:25:21 +00:00
Kenneth Graunke
9e750f00c3 intel/brw: Make opt_copy_propagation_defs clean up its own trash
Copy propagation often eliminates all uses of an instruction.  If we
detect that we've done so, we can eliminate the instruction ourselves
rather than leaving it hanging until the next DCE pass.

This saves some CPU time as other passes don't see dead code.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
2af84c2d49 intel/brw: Use the defs-based copy propagation along with the old one
The new def-based pass works better in many cases, and should be less
resource intensive.  However, the limited visibility of the defs-based
pass due to many values not being SSA yet makes it unable to fully
replace the old pass.  Try the new one, and if it can't make progress,
then try the old one.  That way, things will mostly be handled by the
new pass, but everything that was being cleaned up still will be.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
580e1c592d intel/brw: Introduce a new SSA-based copy propagation pass
(Quite a few of the restrictions here are ported from the old pass.)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
9690bd369d intel/brw: Delete old local common subexpression elimination pass
We no longer use this older pass, so there's no need to keep it.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
8f09c58ddc intel/brw: Switch to the new defs-based global CSE pass
While the limited visibility due to partial SSA is a downside to the new
pass, it has a huge number of advantages that make it worth switching
over even now.  It's much more efficient, can eliminate redundant memory
loads across blocks, and doesn't generate loads of unnecessary copies
that other passes have to clean up.  This means we also eliminate the
infighting between the old CSE, coalescing, and copy propagation passes.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
234c45c929 intel/brw: Write a new global CSE pass that works on defs
This has a number of advantages compared to the pass I wrote years ago:

- It can easily perform either Global CSE or block-local CSE, without
  needing to roll any dataflow analysis, thanks to SSA def analysis.
  This global CSE is able to detect and coalesce memory loads across
  blocks.  Although it may increase spilling a little, the reduction
  in memory loads seems to more than compensate.

- Because SSA guarantees that values are never written more than once,
  the new CSE pass can directly reuse an existing value.  The old pass
  emitted copies at the point where it discovered a value because it
  had no idea whether it'd be mutated later.  This led it to generate
  a ton of trash for copy propagation to clean up later, and also a
  nasty fragility where CSE, register coalescing, and copy propagation
  could all fight one another by generating and cleaning up copies,
  leading to infinite optimization loops unless we were really careful.
  Generating less trash improves our CPU efficiency.

- It uses hash tables like nir_instr_set and nir_opt_cse, instead of
  linearly walking lists and comparing each element.  This is much more
  CPU efficient.

- It doesn't use liveness analysis, which is one of the most expensive
  analysis passes that we have.  Def analysis is cheaper.

In addition to CSE'ing SSA values, we continue to handle flag writes,
as this is a huge source of CSE'able values.  These remain block local.
However, we can simply track the last flag write, rather than creating
entire sets of instruction entries like the old pass.  Much simpler.

The only real downside to this pass is that, because the backend is
currently only partially SSA, it has limited visibility and isn't able
to see all values.  However, the results appear to be good enough that
the new pass can effectively replace the old pass in almost all cases.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
2b30b3bbd4 intel/brw: Print defs in dump_instructions
Like NIR, we print SSA defs as %1, %2, and so on.  The number here is
the VGRF number.  VGRFs that don't correspond to a SSA def remain
printed as vgrf1, vgrf2, and so on.

This makes it much easier to see what values are SSA and which aren't.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Caio Oliveira
08da7edc0e intel/brw: Track the number of uses of each def in def_analysis
Even without a full use list, simply tracking the number of uses will
let us tell "this is the only use of the def" or "we've just replaced
all uses of a def".  It's inexpensive to calculate and will be useful.

(rebased by Kenneth Graunke)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
0d144821f0 intel/brw: Add a new def analysis pass
This introduces a new analysis pass that opportunistically looks for
VGRFs which happen to satisfy the SSA definition properties.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
ad9e414aa9 intel/brw: Skip LOAD_PAYLOADs after every texture instruction if possible
This avoids generating a bunch of trash we have to clean up later.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
84219892ad intel/brw: Make gl_SubgroupInvocation lane index loading SSA
Our code to initialize gl_SubgroupInvocation uses multiple instructions
some of which are partial writes.  This makes it difficult to analyze
expressions involving gl_SubgroupInvocation, which appear very
frequently in compute shaders.

To make this easier, we add a new virtual opcode which initializes
a full VGRF to the value of gl_SubgroupInvocation.  (We also expand
it to UD for SIMD8 so there are not partial write issues.)  We then
lower it to the original code later on in compilation, after we've
done the bulk of our optimizations.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
344d4ee9f0 intel/brw: Make VEC() perform a single write to its destination.
This gathers a number of sources into a contiguous vector register,
typically using LOAD_PAYLOAD.  However, it uses MOV for a single source.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Francisco Jerez
06e4e088a3 intel/brw/xe2+: Use active-thread-only barriers available since Xe2+.
These allow avoiding dead-locks in non-compliant applications that
execute barriers under non-uniform control flow.  They're not expected
to have any major disadvantage so let's enable them unconditionally.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29562>
2024-06-17 16:19:18 -07:00
Alyssa Rosenzweig
15257b65c6 treewide: use nir_metadata_control_flow
Via Coccinelle patch:

    @@
    @@

    -nir_metadata_block_index | nir_metadata_dominance
    +nir_metadata_control_flow

...plus some manual fixups for call sites missed by coccinelle.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Juan A. Suarez Romero <jasuarez@igalia.com> [broadcom]
Acked-by: Vasily Khoruzhick <anarsoul@gmail.com> [lima]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29745>
2024-06-17 16:28:14 -04:00
Daniel Schürmann
7af16e9f1e nir/shader_info: remove uses_demote
This flag is mostly redundant with uses_discard and was only
introduced to implement demote with LLVM when it didn't have
that intrinsic.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27617>
2024-06-17 19:37:16 +00:00
Daniel Schürmann
9b1a748b5e nir: remove nir_intrinsic_discard
The semantics of discard differ between GLSL and HLSL and
their various implementations. Subsequently, numerous application
bugs occurred and SPV_EXT_demote_to_helper_invocation was written
in order to clarify the behavior. In NIR, we now have 3 different
intrinsics for 2 things, and while demote and terminate have clear
semantics, discard still doesn't and can mean either of the two.

This patch entirely removes nir_intrinsic_discard and
nir_intrinsic_discard_if and replaces all occurences either with
nir_intrinsic_terminate{_if} or nir_intrinsic_demote{_if} in the
case that the NIR option 'discard_is_demote' is being set.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27617>
2024-06-17 19:37:16 +00:00
Daniel Schürmann
f3d8bd18dd nir: introduce discard_is_demote compiler option
This new option indicates that the driver emits the same
code for nir_intrinsic_discard and nir_intrinsic_demote.
Otherwise, it is assumed that discard is implemented as
terminate.

spirv_to_nir uses this option in order to directly emit
nir_demote in case of OpKill.

RADV GFX11:
Totals from 3965 (4.99% of 79439) affected shaders:
MaxWaves: 119418 -> 119424 (+0.01%); split: +0.03%, -0.03%
Instrs: 1608753 -> 1620830 (+0.75%); split: -0.18%, +0.93%
CodeSize: 8759152 -> 8785152 (+0.30%); split: -0.18%, +0.48%
VGPRs: 152292 -> 149232 (-2.01%); split: -2.37%, +0.36%
Latency: 9162314 -> 10033923 (+9.51%); split: -0.46%, +9.97%
InvThroughput: 1491656 -> 1493408 (+0.12%); split: -0.10%, +0.22%
VClause: 21424 -> 21452 (+0.13%); split: -0.31%, +0.44%
SClause: 53598 -> 55871 (+4.24%); split: -2.15%, +6.39%
Copies: 90553 -> 90462 (-0.10%); split: -2.91%, +2.81%
Branches: 16283 -> 16311 (+0.17%)
PreSGPRs: 113993 -> 113254 (-0.65%); split: -1.84%, +1.19%
PreVGPRs: 110951 -> 108914 (-1.84%); split: -2.08%, +0.24%
VALU: 963192 -> 963167 (-0.00%); split: -0.01%, +0.01%
SALU: 87926 -> 90795 (+3.26%); split: -2.92%, +6.18%
VMEM: 25937 -> 25936 (-0.00%)
SMEM: 110012 -> 109799 (-0.19%); split: -0.20%, +0.01%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27617>
2024-06-17 19:37:15 +00:00
Lionel Landwerlin
13dc2a28ce intel/fs: fix lower_simd_width for MOV_INDIRECT
MOV_INDIRECT picks one lane from the src[0] and moves it to all lanes
in the destination. Even if we split the instruction, src[0] should
remain identical.

Noticed this while trying to use this instruction in SIMD32. All
current use cases are limited to SIMD8 shaders (or SIMD16 on Xe2). Or
maybe in SIMD32 but with a uniform src[0]. That's we think we've never
seen the issue so far.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28036>
2024-06-14 22:21:26 +00:00
Sviatoslav Peleshko
5ca51156e2 intel/elk: Actually retype integer sources of sampler message payload
According to PRMs:
"All parameters are of type IEEE_Float, except those in the The ld*,
resinfo, and the offu, offv of the gather4_po[_c] instruction message
types, which are of type signed integer."

Currently, we load parameters with the correct types, but use them as send
sources with the default float type, which may confuse passes downstream.
Fix this by actually storing the retyped sources.

Cc: mesa-stable
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29581>
2024-06-12 18:59:17 +00:00
Sviatoslav Peleshko
2358c997f3 intel/brw: Actually retype integer sources of sampler message payload
According to PRMs:
"All parameters are of type IEEE_Float, except those in the The ld*,
resinfo, and the offu, offv of the gather4_po[_c] instruction message
types, which are of type signed integer."

Currently, we load parameters with the correct types, but use them as send
sources with the default float type, which may confuse passes downstream.
Fix this by actually storing the retyped sources.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11118
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29581>
2024-06-12 18:59:17 +00:00
Kenneth Graunke
f04bb49465 intel/brw: Delete SAD2 and SADA2 opcodes
These were removed with Icelake.  While they technically still exist on
Skylake, which this compiler supports, we have never used these opcodes
in the 14 years we could have done so.  So just scrap them.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29665>
2024-06-10 16:47:50 -07:00
Marcin Ślusarz
a1d8837bad anv,intel/compiler/xe2: fill MESH_CONTROL.VPandRTAIndexAutostripEnable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29617>
2024-06-10 15:21:34 +00:00
Kenneth Graunke
3da444b79e intel/brw: Refactor code to commute immediates into legal positions
This will let us reuse this in a new pass shortly.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
2024-06-08 02:19:12 -07:00
Kenneth Graunke
d45da713e7 intel/brw: Refactor try_constant_propagate()
This will let us reuse the bulk of this code in a new copy propagation
pass without replicating it.  We retain a wrapper function for dealing
with ACP entries, which the new pass won't have.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
2024-06-08 02:19:10 -07:00
Kenneth Graunke
85aa6f80af intel/brw: Drop BRW_OPCODE_IF from try_constant_propagate
This was for Sandybridge's IF with embedded comparison, which only
existed for a single generation of hardware.  Since the compiler fork,
we no longer support Sandybridge here.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
2024-06-08 02:19:08 -07:00
Kenneth Graunke
7019bc4469 intel/brw: Drop compiler parameter from try_constant_propagate()
This is unused.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
2024-06-08 02:19:06 -07:00
Kenneth Graunke
43ab997951 intel/brw: Update instructions_match() to compare more fields
We were missing the following "newer" fields:
- ex_desc
- predicate_trivial
- sdepth
- rcount
- writes_accumulator
- no_dd_clear
- no_dd_check
- check_tdr
- send_is_volatile
- send_ex_desc_scratch
- send_ex_bso
- last_rt
- keep_payload_trailing_zeroes
- has_packed_lod_ai_src

We can actually just check ex_desc and the new "bits" union to handle
most of them with fewer checks.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
2024-06-08 02:19:03 -07:00
Kenneth Graunke
061da9f748 intel/brw: Make brw_reg::bits publicly accessible from fs_reg
I want to be able to hash an fs_reg, including all the brw_reg fields.
It's easiest to do this if I can use the "bits" union field that
incorporates many of the other ones.

We also move the using declaration for "nr" down because that field was
moved to the second section a while back.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
2024-06-08 02:19:01 -07:00
Kenneth Graunke
b4a595204b intel/brw: Add a idom_tree::dominates(a, b) helper.
Simpler to use than the existing methods.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
2024-06-08 02:18:56 -07:00
Kenneth Graunke
e2d9ff8004 intel/brw: Handle scratch address swizzling of constants
Pass in the nir_src and check if it's constant, handling it via CPU-side
arithmetic instead of emitting instructions.  While we can constant fold
these via our optimization passes, we have to do opt_algebraic to fold
the binary operation with constant sources into a MOV of an immediate,
then opt_copy_propagation to put it in the next expression, and so on,
until the entire expression is folded.  This can take several iterations
of the optimization loop, which is inefficient.

For example, gfxbench5/aztec-ruins/normal/7 has load/store_scratch
intrinsics with constant sources, and this patch removes a number of
optimization passes according to INTEL_DEBUG=optimizer.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
2024-06-08 02:18:54 -07:00
Kenneth Graunke
07745752d6 intel/brw: Skip fs_nir_setup_outputs for compute shaders
There aren't any outputs, so there's no point to doing this work.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
2024-06-08 02:18:54 -07:00
Kenneth Graunke
fa1564fb87 intel/brw: Recreate GS output registers after EmitVertex
Geometry shaders write outputs multiple times, with EmitVertex()
between them.  The value of output variables becomes undefined after
calling EmitVertex(), so we don't need to preserve those.  This lets
us recreate new registers after each EmitVertex(), assuming we aren't
in control flow, allowing them to have separate live ranges.  It also
means that those registers are more likely to be written once, rather
than having multiple writes, which can make optimization easier.

This is pretty much a total hack, but it's helpful.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
2024-06-08 02:18:51 -07:00
Rohan Garg
6f6da58315 intel/compiler: fix shuffle generation on LNL
There doesn't seem to be a restriction on the mentioned data types on
LNL anymore. Default to a maximum exec size of SIMD16.

This patch fixes dEQP-VK.subgroups.shuffle.framebuffer.* on LNL

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29504>
2024-06-07 12:51:30 +00:00
Sagar Ghuge
2dba5d484b intel/fs: Adjust destination register size for global atomic on Xe2+
For 16-bit data type, we are padding 16-bit and using 32-bit data type,
so we need to account for the padded portion while calculating the
size_written.

Rework: (Rohan)
- Drop unnecessary fs_builder instance

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29271>
2024-06-06 00:18:37 +00:00
Sagar Ghuge
55c7b24899 intel/fs: Adjust destination register size for untyped atomic on Xe2+
For 16-bit data type, we are padding 16-bit and using 32-bit data type,
so we need to account for the padded portion while calculating the
size_written.

Rework: (Rohan)
- Drop unnecessary fs_builder instance

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29271>
2024-06-06 00:18:37 +00:00
Jordan Justen
1fa84d34ef intel/compiler: Don't set size written in brw_lower_logical_sends.cpp
Rework: (Sagar)
- Drop unused variable

Suggested-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29271>
2024-06-06 00:18:37 +00:00
Zach Battleman
ecfe8b0f75 intel/brw: update Wa_1805992985 to use workarounds mechanism
Replaced two instances of checking version 11 with the new workaround
mechanism.

Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29560>
2024-06-05 23:45:33 +00:00
Zach Battleman
ddaa7c4221 intel/brw: update comment to accurately reflect intended behavior
Removed mention of Wa_* when referencing an intended harware behavior
since version 12. This will prevent the erroneous usage of the
`intel_needs_workaround` in the future.

Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29559>
2024-06-05 23:23:30 +00:00
Iván Briano
1c6a6349b0 intel/brw: always read LAYER/VIEWPORT from the FS payload
Following on https://gitlab.freedesktop.org/mesa/mesa/-/issues/9811 the
restriction that kept us from using the payload values for non-mesh
cases is gone, so just use the same codepath for everything.
But since we have functions that correctly read those for all gens, use
those instead of the broken hack we had until now.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9796

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29448>
2024-06-05 21:52:51 +00:00
Iván Briano
3d071fe7db intel/brw: add fetch_viewport_index function
Like fetch_render_target_array_index(), it reads the values provided by
the FS payload.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29448>
2024-06-05 21:52:51 +00:00
Sagar Ghuge
415c5ad989 intel/compiler: No need to re-type the destination register
For 16-bit float case handling, intermediate destination register is
already 32-bit wide, we don't have to retype it to 32-bit.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29506>
2024-06-04 18:07:44 +00:00
Lionel Landwerlin
724bb7fa15 brw: better model READ_ARF_REG opcode
This opcode gets translated to 2 ALU instructions with dependency ALU
stall. This change reproduces the FS_OPCODE_PACK_HALF_2x16_SPLIT
values which is another opcode that generates 2 instructions.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>
2024-05-31 20:22:27 +00:00
Lionel Landwerlin
ac03cefb28 brw: limit dependencies on SR register
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>
2024-05-31 20:22:27 +00:00
Lionel Landwerlin
d8b78924c5 brw: use a single virtual opcode to read ARF registers
In 2c65d90bc8 I forgot to add the new SHADER_OPCODE_READ_MASK_REG
opcode to the list of barrier instruction in the scheduler. Let's just
use a single opcode for all ARF registers that need special
scoreboarding and put the register as source (nicer for the debug
output).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2c65d90bc8 ("intel/brw: ensure find_live_channel don't access arch register without sync")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>
2024-05-31 20:22:27 +00:00