Commit graph

236 commits

Author SHA1 Message Date
Caio Oliveira
0fcce2722f brw: Add brw_send_inst
Move all the SEND specific fields from brw_inst into brw_send_inst.
This new instruction kind will contain all variants of SENDs plus the
virtual opcodes that were already relying on those SEND fields.

Use the `as_send()` helper to go from a brw_inst into the brw_send_inst
when applicable.  Some of the code was changed to use the brw_send_inst
type directly.

Until other kinds are added, all the instructions are allocated the same
amount of space as brw_send_inst.  This ensures that all
brw_transform_inst() calls are still valid.  This will change after
a few patches so that BASE instructions can use less memory.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:25:01 +00:00
Caio Oliveira
71c23c6722 brw: Add brw_builder::URB_READ and URB_WRITE helpers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:25:00 +00:00
Caio Oliveira
f92116832f brw: Add brw_builder::SEND() helper
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:24:59 +00:00
Francisco Jerez
531a34c7dd intel/brw/xe3+: Select scheduler heuristic with best trade-off between register pressure and latency.
The current register allocation loop attempts to use a sequence of
pre-RA scheduling heuristics until register allocation is successful.
The sequence of scheduling heuristics is expected to be increasingly
aggressive at reducing the register pressure of the program (at a
performance cost), so that the instruction ordering chosen gives the
lowest latency achievable with the register space available.

Unfortunately that approach doesn't consistently give the best
performance on xe3+, since on recent platforms a schedule with higher
latency may actually give better performance if its lower register
pressure allows the use of a lower number of VRT register blocks which
allows the EU to run more threads in parallel.

This means that on xe3+ the scheduling mode with highest performance
is fundamentally dependent on the specific scenario (in particular
where in the thread count-register use curve the program is at, and
how effective the scheduler heuristics are at reducing latency for
each additional block of GRFs used), so it isn't possible to construct
a fixed sequence of the existing heuristics guaranteed to be ordered
by decreasing performance.  In order to find the scheduling heuristic
with better performance we have to run multiple of them prior to
register allocation and do some arithmetic to account for the effect
on parallelism of the register pressure estimated in each case, in
order to decide which heuristic will give the best performance.

This sounds costly but it is similar to the approach taken by
brw_allocate_registers() when unable to allocate without spills in
order to decide which scheduling heuristic to use in order to minimize
the number of spills.  In cases where that happens on xe3+ the
scheduling runs introduced here don't add to the scheduling runs done
to find the heuristic with minimum register pressure, we attempt to
determine the heuristic with lowest pressure and best performance in
the same loop, and then use one or the other depending on whether
register allocation succeeds without spills.

Significantly improves performance on PTL of the following Traci test
cases (4 iterations, 5% significance):

Nba2K23-trace-dx11-2160p-ultra:                     4.48% ±0.38%
Fortnite-trace-dx11-2160p-epix:                     1.61% ±0.28%
Superposition-trace-dx11-2160p-extreme:             1.37% ±0.26%
PubG-trace-dx11-1440p-ultra:                        1.15% ±0.29%
GtaV-trace-dx11-2160p-ultra:                        0.80% ±0.24%
CitiesSkylines2-trace-dx11-1440p-high:              0.68% ±0.19%
SpaceEngineers-trace-dx11-2160p-high:               0.65% ±0.34%

The compile-time cost of shader-db increases significantly by 3.7%
after this commit (15 iterations, 5% significance), the compile-time
of fossil-db doesn't change significantly in my setup.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>
2025-09-10 02:15:57 +00:00
Francisco Jerez
0e802cecba intel/brw: Make sure we don't use stale analysis after inst. order restore in brw_allocate_registers().
Do invalidate_analysis() from restore_instruction_order() to make sure
we don't re-use stale analysis pass results if the user forgets to
call invalidate_analysis() explicitly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>
2025-09-10 02:15:57 +00:00
Francisco Jerez
35ac517780 intel/brw/xe3+: Define BRW_SCHEDULE_PRE_LATENCY scheduling mode.
This defines a new pre-RA scheduling mode similar to BRW_SCHEDULE_PRE
but more aggressive at optimizing for minimum latency rather than
minimum register usage.  The main motivation is that on recent xe3
platforms we use a register allocation heuristic that packs variables
more tightly at the bottom of the register file instead of the
round-robin heuristic we used on previous platforms, since as a result
of VRT there is a parallelism penalty when a program uses more GRF
registers than necessary.  Unfortunately the xe3 tight-packing
heuristic severely constrains the work of the post-RA scheduler due to
the false dependencies introduced during register allocation, so we
can do a better job by making the scheduler aware of instruction
latencies before the register allocator introduces any false
dependencies.

This can lead to higher register pressure, but only when the scheduler
decides it could save cycles by extending a live range.  It makes
sense to preserve the preexisting BRW_SCHEDULE_PRE as a separate mode
since some workloads can still benefit from neglecting latencies
pre-RA due to the trade-off mentioned between parallelism and GRF use,
a future commit will introduce a more accurate estimate of the
expected relative performance of BRW_SCHEDULE_PRE
vs. BRW_SCHEDULE_PRE_LATENCY taking into account this trade-off.

In theory this could also be helpful on earlier pre-xe3 platforms, but
the benefit should be significantly smaller due to the different RA
heuristic so it hasn't been tested extensively pre-xe3.

The following Traci tests are improved significantly by this change on
PTL (nearly all tests that run on my system are affected positively):

Ghostrunner2-trace-dx11-1440p-ultra:                7.12% ±0.36%
SpaceEngineers-trace-dx11-2160p-high:               5.77% ±0.43%
HogwartsLegacy-trace-dx12-1080p-ultra:              4.40% ±0.03%
Naraka-trace-dx11-1440p-highest:                    3.06% ±0.43%
MetroExodus-trace-dx11-2160p-ultra:                 2.26% ±0.60%
Fortnite-trace-dx11-2160p-epix:                     2.12% ±0.53%
Nba2K23-trace-dx11-2160p-ultra:                     1.98% ±0.30%
Control-trace-dx11-1440p-high:                      1.93% ±0.36%
GodOfWar-trace-dx11-2160p-ultra:                    1.62% ±0.47%
TotalWarPharaoh-trace-dx11-1440p-ultra:             1.55% ±0.18%
MountAndBlade2-trace-dx11-1440p-veryhigh:           1.51% ±0.37%
Destiny2-trace-dx11-1440p-highest:                  1.44% ±0.34%
GtaV-trace-dx11-2160p-ultra:                        1.26% ±0.27%
ShadowTombRaider-trace-dx11-2160p-ultra:            1.10% ±0.58%
Borderlands3-trace-dx11-2160p-ultra:                0.95% ±0.43%
TerminatorResistance-trace-dx11-2160p-ultra:        0.87% ±0.22%
BaldursGate3-trace-dx11-1440p-ultra:                0.84% ±0.28%
CitiesSkylines2-trace-dx11-1440p-high:              0.82% ±0.22%
PubG-trace-dx11-1440p-ultra:                        0.72% ±0.37%
Palworld-trace-dx11-1080p-med:                      0.71% ±0.26%
Superposition-trace-dx11-2160p-extreme:             0.69% ±0.19%

The compile-time cost of shader-db increases significantly by 1.85%
after this commit (14 iterations, 5% significance), the compile-time
of fossil-db doesn't change significantly in my setup.

v2: Addressed interaction with 81594d0db1,
    since the code that calculates deps, delays and exits is no longer
    mode-independent after this change.  Instead of reverting that
    commit (which is non-trivial and would have a greater compile-time
    hit) simply reconstruct the scheduler object during the transition
    between BRW_SCHEDULE_PRE_LATENCY and any other PRE mode that
    doesn't require instruction latencies.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>
2025-09-10 02:15:55 +00:00
Caio Oliveira
9d53e27579 intel/brw: Remove brw_shader::import_uniforms()
The brw_shader::uniforms now is derived from the nir_shader.  The
only exception is compute shaders for older Gfx versions, so we
move the adjust logic for that.

The benefit here is untangling the code for compilation variants,
that before needed to keep track of the first that compiled to,
in most cases, copy an integer.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
2025-08-28 00:06:19 +00:00
Caio Oliveira
b8a35a8a27 brw: Pass per_primitive_offset in brw_shader_params
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
2025-08-28 00:06:19 +00:00
Caio Oliveira
6ca9021758 brw: Add brw_shader_params
And unify the initialization code for brw_shader.  Avoid passing
brw_compile_params since for a single compilation we might have
multiple shaders (the case for BS stage).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
2025-08-28 00:06:18 +00:00
Calder Young
c7e48f79b7 brw,anv: Reduce UBO robustness size alignment to 16 bytes
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Instead of being encoded as a contiguous 64-bit mask of individual registers,
the robustness information is now encoded as a vector of up to 4 bytes that
represent the limits of each of the pushed UBO ranges in 16 byte units.
Some buggy Direct3D workloads are known to depend on a robustness alignment
as low as 16 bytes to work properly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36455>
2025-08-21 09:04:55 +00:00
Lionel Landwerlin
2281e88381 brw: make assign_curb_setup visible in optimizer debug
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36455>
2025-08-21 09:04:54 +00:00
Lionel Landwerlin
df37c7ca74 brw: fix analysis dirtying with pulled constants
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 5c17299084 ("brw: enable A64 pulling of push constants")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36455>
2025-08-21 09:04:53 +00:00
Lionel Landwerlin
c871a62a75 brw: move URB channel mask shifting to the lowering pass
For example Xe2 uses the LSC and doesn´t need the shifting, so let's
just apply it where it's needed.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36757>
2025-08-13 12:01:49 +00:00
Kenneth Graunke
47fe9d28e7 brw: Enumerate SHADER_OPCODE_SEND sources and standardize how many
This introduces enums for SHADER_OPCODE_SEND[_GATHER] sources, similar
similar to what we've done for most of the newer logical opcodes.  This
allows us to use actual names for sources rather than remembering their
order, or leaving ourselves comments like /* ex_desc */ all over.  It
will also make it easier to add or reorder sources in the future.

While we're at it, we also standardize on the number of sources.
Previously, we allowed SHADER_OPCODE_SEND to have either 3 (monosend) or
4 (split send) sources, but this is mostly for haphazard historical
reasons.  We now specify all sources every time, eliminating the need
for careful inst->source checks before accessing the last source.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>
2025-08-08 22:12:08 +00:00
Qiang Yu
260bdad074 all: rename gl_shader_stage_is_rt to mesa_shader_stage_is_rt
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569>
2025-08-06 10:28:41 +08:00
Qiang Yu
b27c8c9eb8 all: rename gl_shader_stage_is_mesh to mesa_shader_stage_is_mesh
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569>
2025-08-06 10:28:41 +08:00
Qiang Yu
7a91473192 all: rename gl_shader_stage_is_compute to mesa_shader_stage_is_compute
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569>
2025-08-06 10:28:41 +08:00
Marek Olšák
db26597f8d intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility
NIR is going to use exec_node/list without the C++ code, and may switch to
a different linked list implementation in the future.

GLSL is going to use ir_exec_node/list, which we want to keep private
for GLSL, so that we can change it easily.

Thus, it's better to fork the C++ version of list.h for Intel.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36425>
2025-07-31 20:23:02 +00:00
Antonio Ospite
ddf2aa3a4d build: avoid redefining unreachable() which is standard in C23
In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>

See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in

And this causes build errors when building for C23:

-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
                 from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
  123 | #define unreachable(str)    \
      |         ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
  456 | #define unreachable() (__builtin_unreachable ())
      |         ^~~~~~~~~~~
-----------------------------------------------------------------------

So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.

Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.

This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.

All the instances of the macro, including the definition, were updated
with the following command line:

  git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
  while read file; \
  do \
    sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
  done && \
  sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>
2025-07-31 17:49:42 +00:00
Caio Oliveira
ac2b072312 brw: Add more specific brw_builder helpers
Replace uses of brw_builder::at() with various more descriptive
variants.  Use block pointer from instruction when possible.

A couple of special cases remained and will be handled in separate patches.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34681>
2025-07-19 17:49:47 +00:00
Caleb Callaway
e7454f5318 intel/debug: shader dump filter
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
v2: Fixes filtering for various brw shader dump logic

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35061>
2025-05-23 19:57:02 +00:00
Lionel Landwerlin
18bbcf9a63 intel: introduce new VUE layout for separate compiled shader with mesh
Mesh shaders have per vertex block in URB pretty much identical to the
VUE format. Let's just reuse that concept to do all of our layout in
the payload attribute registers. This will ensure that we have
consistent VUE layout between Mesh & non-Mesh pipelines.

We need a new way of laying out the VUE though as we have to
accomodate a HW constraint of maximum (per-primitive + per-vertex) of
32 varying. This means we cannot have 2 locations in the payload for
things like PrimitiveID which can come from either the per-primitive
or the per-vertex block. The new layout places the PrimitiveID at the
end of the per-vertex attributes and shrinks the delivery dynamically
if the mesh stage is active. The shader is compiled with a
MOV_INDIRECT to read the PrimitiveID from the right location in the
attributes.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
2025-05-08 06:48:35 +00:00
Caio Oliveira
a6b0783375 brw: Use brw_ip_ranges in scheduling / regalloc
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34012>
2025-03-29 00:25:51 +00:00
Caio Oliveira
10660f5adf brw: Add analysis for block IP ranges
Calculate the IP ranges of the shader as an analysis pass.  This will
later replace the existing tracking of start_ip/end_ip as the blocks are
changed (and the defer/adjust scheme to avoid too much churn when that
happen).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34012>
2025-03-29 00:25:50 +00:00
Caio Oliveira
fd6045cca9 brw: Track total_instructions in a shader
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34012>
2025-03-29 00:25:50 +00:00
Lionel Landwerlin
4db4bd1d04 brw: always write the VUE header
In 35df3925ca ("brw: ensure VUE header writes in HS/DS/GS stages") I
misread the PRMs and thought that the VF would initialize the header.

What actually happens is that the VF does not write valid values in
there and the PRMs explicitly say that the VS shader should overwrite
whatever is in there.

We could avoid writing the header in some cases when no HW is going to
read back the header. For example with rendering disables through
3DSTATE_STREAMOUT::RenderingDisable. But those cases are dynamic and
the compiler is not able to tell. So just always write the header.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 35df3925ca ("brw: ensure VUE header writes in HS/DS/GS stages")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12880
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34211>
2025-03-27 07:42:23 +00:00
Lionel Landwerlin
35df3925ca brw: ensure VUE header writes in HS/DS/GS stages
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12820
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34041>
2025-03-13 16:06:01 +00:00
Caio Oliveira
8e2a7cb42d brw: Embed at_end() inside brw_builder(brw_shader *) constructor
All remaining uses of that constructor would also use at_end(),
and vice-versa.  So just implement that behavior in the constructor
itself.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33815>
2025-03-06 23:33:38 +00:00
Kenneth Graunke
88309a9818 brw: Rename shared function enums for clarity
Our name for this enum was brw_message_target, but it's better known as
shared function ID or SFID.  Call it brw_sfid to make it easier to find.

Now that brw only supports Gfx9+, we don't particularly care whether
SFIDs were introduced on Gfx4, Gfx6, or Gfx7.5.  Also, the LSC SFIDs
were confusingly tagged "GFX12" but aren't available on Gfx12.0; they
were introduced with Alchemist/Meteorlake.

GFX6_SFID_DATAPORT_SAMPLER_CACHE in particular was confusing.  It sounds
like the SFID to use for the sampler on Gfx6+, however it has nothing to
do with the sampler at all.  BRW_SFID_SAMPLER remains the sampler SFID.
On Haswell, we ran out of messages on the main data cache data port, and
so they introduced two additional ones, for more messages.  The modern
Tigerlake PRMs simply call these DP_DC0, DP_DC1, and DP_DC2.  I think
the "sampler" name came from some idea about reorganizing messages that
never materialized (instead, the LSC came as a much larger cleanup).

Recently we've adopted the term "HDC" for the legacy data cluster, as
opposed to "LSC" for the modern Load/Store Cache.  To make clear which
SFIDs target the legacy HDC dataports, we use BRW_SFID_HDC0/1/2.

We were also citing the G45, Sandybridge, and Ivybridge PRMs for a
compiler that supports none of those platforms.  Cite modern docs.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33650>
2025-02-27 08:49:24 +00:00
Caio Oliveira
ff44f4d278 intel/brw: Update outdated comments
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32536>
2025-02-11 09:13:28 +00:00
Caio Oliveira
cf3bb77224 intel/brw: Rename fs_visitor to brw_shader
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32536>
2025-02-11 09:13:28 +00:00
Caio Oliveira
352a63122f intel/brw: Rename files brw_fs.cpp/h to brw_shader.cpp/h
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32536>
2025-02-11 09:13:28 +00:00
Caio Oliveira
9b0d359737 intel/brw: Move fs_inst implementation code together
Move them to brw_inst.h/cpp.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33114>
2025-01-31 00:57:20 +00:00
Caio Oliveira
650ec7169d intel/brw: Add SHADER_OPCODE_SEND_GATHER
Starting in Xe3, there's a variant of SEND that take the
register numbers from the ARF scalar register, and don't
require them to be contiguous.  The new opcode added here
represents that kind of SEND.

To make the original sources still reachable, we keep them
around during the IR, just ignoring them at generator time.
This allow software scoreboard to properly reason the
dependencies without trying to decode the contents of ARF
scalar register being used.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>
2025-01-30 04:43:58 +00:00
Lionel Landwerlin
0a5bdf1199 brw: add infra to make use of the address register in the IR
This limits the address register to simple cases inside a block.

Validation ensures that the address register is only written once and
read once.

Instruction scheduling makes sure that instructions using the address
register in the generator are not scheduled while there is an usage of
the register in the IR.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>
2025-01-11 08:41:42 +00:00
Caio Oliveira
3ca6fa7487 intel/brw: Gather brw_reg related implementations in brw_reg.cpp
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32800>
2024-12-30 18:26:59 +00:00
Kenneth Graunke
02482604e5 intel/brw: Delete old-style surface and A64 message opcodes
These have now been replaced by the MEMORY_*_LOGICAL opcodes.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
d5f38be713 intel/brw: Introduce new MEMORY_*_LOGICAL opcodes
This is a new unified set of opcodes for memory access loosely patterned
after the new LSC-style data port messages introduced on Alchemist GPUs.

Rather than creating an opcode for every type of memory access, it has
only three opcodes: load, store, and atomic.  It has various sources to
indicate the rest:

- Binding type (raw pointer, pointer to surface state, or BT index)
- Address size (A64, A32, A16)
- Data size (bit size, number of components)
- Opcode (atomic opcode, or LOAD/STORE vs. LOAD_CMASK/STORE_CMASK)
- Mode (typed vs. untyped vs. shared-local vs. scratch)
- Address (and its dimensionality)
- Data (0 for loads, 1 for stores, 2 for atomics)
- Whether we want block access

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Caio Oliveira
c92b8a802e intel/brw: Move remaining compile stages to their own files
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
71ccf8e4cd intel/brw: Rename fs_reg_* helpers to brw_reg_*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>
2024-07-03 02:53:19 +00:00
Caio Oliveira
3670c24740 intel/brw: Replace uses of fs_reg with brw_reg
And remove the fs_reg alias.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>
2024-07-03 02:53:19 +00:00
Caio Oliveira
e4f37c6ab9 intel/brw: Move most member functions from fs_reg to brw_reg
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>
2024-07-03 02:53:18 +00:00
Kenneth Graunke
f04bb49465 intel/brw: Delete SAD2 and SADA2 opcodes
These were removed with Icelake.  While they technically still exist on
Skylake, which this compiler supports, we have never used these opcodes
in the 14 years we could have done so.  So just scrap them.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29665>
2024-06-10 16:47:50 -07:00
Iván Briano
a9f24fb5f1 intel/brw: fix subgroup size of geometry stages for lnl+
Fixes dEQP-VK.subgroups.size_control.*allow_varying_subgroup_size* and
maybe others checking subgroup size.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29177>
2024-05-14 23:13:37 +00:00
Kenneth Graunke
ed3e4c16dc intel/brw: Do not create empty basic blocks when removing instructions
If there's only a single instruction in a basic block, then removing it
would create an empty block.  We seem to have trouble representing those
as there are no instructions with an IP inside the block; several places
mess up connections.  While most blocks end in control flow instructions
(which are rarely eliminated), ones preceding a DO instruction may end
in an ordinary instruction.  This makes such blocks tricky to merge with
adjacent blocks - they may be between loops.  Any optimization pass may
may find such an instruction and want to eliminate it, and most of them
are unprepared to perform such CFG link surgery.  Nor do we want to make
every pass aware of this issue.

To work around this, we simply replace an instruction with a NOP when
removing it from a block containing only that instruction, leaving the
block in place.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>
2024-04-30 17:16:39 -07:00
Kenneth Graunke
545bb8fb6f intel/brw: Replace type_sz and brw_reg_type_to_size with brw_type_size_*
Both of these helpers do the same thing.  We now have brw_type_size_bits
and brw_type_size_bytes and can use whichever makes sense in that place.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
007d891239 intel/brw: Use newer brw_type_is_* shorter names
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
873fcdff38 intel/brw: Stop using long BRW_REGISTER_TYPE enum names
s/BRW_REGISTER_TYPE/BRW_TYPE/g

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
9d8f2c4421 intel/brw: Rework BRW_REGISTER_TYPE's representation semantics
In ancient days, we directly used the hardware register type encodings
throughout the compiler.  As more GPU generations came out, encodings
shifted, and we moved to an abstract enum that we could encode/decode
to a particular GPU's hardware encoding.  But there was no particular
meaning behind any particular value.

One downside to this approach is that we end up with switch statements
galore.  Want to know a type's size?  Switch.  Convert a unsigned type
to a signed one?  Switch.  Get a type with the same base type, but
different bit size?  Switch.  This is both inefficient and inconvenient.

In contrast, nir_alu_type takes a nicer approach - the type encoding has
certain bits representing the base type, and others encoding the size of
the type.  Switching base types or sizes is a simple matter of masking
out the relevant field and substituting a different one.

Tigerlake's encoding adopts a similar approach: two bits represent the
size as a 2-bit unsigned number n, where the bit size is (8 * 2^n).
Two more bits represent the base type.  Past encodings were a bit ad hoc
as new data types were added over time, but Gfx12 is organized (mostly).

This patch converts our brw_reg_type enum over to a new system that's
patterned after the Tigerlake style (for easy conversion) while
deviating in a few ways that make our vector immediate type size
handling simpler.  Should we add additional base types, we're likely
to continue deviating.  Still, converting is much simpler.

Type size calculations (which are performed all the time) are now a
simple mask and shift, instead of a switch.

We also adopt the name BRW_TYPE_* instead of BRW_REGISTER_TYPE_* because
it's much shorter and easier to type.  Similarly, we create new helper
functions named brw_type_* for working with these types, with a cleaner
naming convention.  Legacy names still exist but will we dropped over
the next few patches as pieces get cleaned up.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
c45e235df5 intel/brw: Drop NF type support
Icelake removed the PLN instruction for interpolating fragment shader
inputs, instead adding a special "Native Float" (NF) data type which
was a 66-bit floating point data type that could only be used with the
accumulator.  On Tigerlake, they dropped NF support in favor of just
doing the interpolation with MAD instructions.

We stopped using NF years ago (commit 9ea90aae1e),
instead just using the fs_visitor::lower_linterp() pass to emit MADs.

Since this existed only for a short time, and had very limited utility,
we drop it from the compiler.  One downside is that we can no longer
disassemble Icelake shaders containing NF types properly, but I doubt
anyone really minds.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00