An effect similar to the one formerly provided by setting thread
control to "switch" can be achieved now by setting a RegDist of 1 on
the SWSB field.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A future lowering pass will simulate the same behavior originally
provided by NoDDChk/NoDDClr at the IR level by using appropriate SWSB
annotations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The new SEND instruction behaves like the former SENDS instruction.
The original single-payload SEND instruction is gone.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The SEND instruction is now four-source. The descriptor is no longer
part of source 1, so avoid touching it to avoid corruption while
initializing the descriptor.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Quite a lot of churn because the encoding of most hardware opcodes has
changed unfortunately.
v2: Split dot-product description fixes to separate patch (Caio).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On Gen12, 64 bit immediate constants are loaded in reverse order. Lower
32 bit gets loaded from bit 96-127 and higher 32 bits from 64-95 in
instruction encoding.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Co-authored-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
These caught a few bugs during the development of this series.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The encoding of almost every instruction field has changed in Gen12,
so this involves adding a Gen12+ bitfield spec to every brw_inst
macro. In addition some new macros are required to handle certain
discontiguous and variable-length fields.
This commit doesn't actually include the Gen12 updated bitfield specs,
only the macros are extended here for reviewability.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Rename FDC() to FFDC() and FDC1() to FDC() for consistency with
the existing F() and FF() macros.
This edge doesn't exist in the original scalar program, but it
represents a potential control flow path the EU will take in cases
where control flow isn't uniform across channels of the same SIMD
thread.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This edge doesn't exist in the original scalar program, but it
represents a potential control flow path the EU will take in cases
where the condition isn't uniform across channels of the same SIMD
thread.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Currently only the physical back-edge is represented, which
incidentally also leads to the exit block of the loop, but we need the
direct logical edge in addition for our logical CFG representation to
be complete.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This represents two control flow graphs in the same cfg_t data
structure: The physical CFG that will include all possible control
flow paths the EU can physically take, and the logical CFG restricted
to the control flow paths that exist in the original scalar program.
The latter is a subset of the former because in case of divergence the
SIMD vectorized program will take control flow paths that aren't part
of the original scalar program.
The bblock_link constructor and bblock_t::add_successor() now take a
"kind" parameter that specifies whether the edge is purely physical or
whether it's part of both the logical and physical CFGs (a logical
edge is of course always guaranteed to be in the physical CFG as
well). bblock_t::is_predecessor_of() and ::is_successor_of() also
take a kind parameter specifying which CFG is being queried. The '~>'
notation will be used now in order to represent purely physical edges
in IR dumps.
This commit doesn't actually add nor remove any edges from the CFG
(the only edges marked as purely physical here are the two WHILE loop
ones that already existed). Optimization passes should continue using
the same (incomplete) physical CFG they were using before until
they're fixed to do something smarter in a later commit, so this
shouldn't lead to any functional changes.
v2: Remove tabs from lines changed in this file (Caio).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Having the IR opcodes locked to their hardware representation is risky
because it causes opcodes as different as BRC and IFF to compare equal
at the IR level (luckily the back-end only ever uses one opcode from
each group, right now), and it prevents us from supporting
instructions that change their hardware representation across
generations, which will become a problem on Gen12+ platforms.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Change brw_inst_set_opcode() and brw_inst_opcode() to call
brw_opcode_encode/decode() transparently in order to translate between
hardware and IR opcodes, and update the EU compaction code in order to
do the same as needed, so we can eventually drop the one-to-one
correspondence between hardware and IR opcodes.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This rewrites the current opcode description tables as a more compact
flat data structure. The purpose is to allow efficient constant-time
look-up by either HW or IR opcode, which will allow us to drop the
hard-coded correspondence between HW and IR opcodes -- See the next
commits for the rationale.
brw_eu.c is now built as C++ source so we can take advantage of
pointers to member in order to make the look-up function work
regardless of the opcode_desc member used as look-up key.
v2: Optimize devinfo struct comparison (Caio)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The brw_inst opcode accessors are going away in one of the following
commits. We could potentially replace them with the new helpers that
do opcode remapping, but that would lead to a circular dependency
between brw_inst.h and brw_eu.h. This way we also avoid ordering
issues that can cause the semantics of the ex_desc accessors to change
depending on whether the ex_desc field is set after or before the
opcode instruction field.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is required because SEND message payload sources are fetched
asynchronously by the hardware, which can lead to WaR data corruption
on Gen12+ platforms if not handled specially by the compiler to
guarantee proper synchronization.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
so that drivers don't have to call nir_strip manually.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Create basic aub_context on GEM_CONTEXT_CREATE.
Set it up and submit a context + ring + pphwsp during execbuf
submission, if it has not been initialized yet.
v2: Write the HWSP only once per engine (Lionel).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v2:
- Only dump context if there were no erros (Lionel).
- Store counter for context handles in aub_file (Lionel).
v3:
- Add a comment about aub_context -> GEM context (Lionel).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We want to be able to create contexts on demand, and increase the GGTT
as needed for that. Use the aub_map_ggtt() function for that.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We want to reuse it in execlists_setup().
v2: Rename it to write_ggtt_ptes() (Lionel).
v3: Rename it to aub_map_ggtt() (Lionel).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This fixes a case where user first creates image and then later binds it
with memory created from AHW buffer.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Just following the spec. Somewhat unclear whether this applies to NULL
surfaces.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It appears we never had a test in piglit or deqp sampling from a null
surface...
It turns out this triggers a hang on IVB only. Updating the null
surface format to R32_UINT fixes the hang on ivb and doesn't affect
other platforms, so set it by default for all platforms.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1872
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>