On Gen12, 64 bit immediate constants are loaded in reverse order. Lower
32 bit gets loaded from bit 96-127 and higher 32 bits from 64-95 in
instruction encoding.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Co-authored-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
These caught a few bugs during the development of this series.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The encoding of almost every instruction field has changed in Gen12,
so this involves adding a Gen12+ bitfield spec to every brw_inst
macro. In addition some new macros are required to handle certain
discontiguous and variable-length fields.
This commit doesn't actually include the Gen12 updated bitfield specs,
only the macros are extended here for reviewability.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Rename FDC() to FFDC() and FDC1() to FDC() for consistency with
the existing F() and FF() macros.
This edge doesn't exist in the original scalar program, but it
represents a potential control flow path the EU will take in cases
where control flow isn't uniform across channels of the same SIMD
thread.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This edge doesn't exist in the original scalar program, but it
represents a potential control flow path the EU will take in cases
where the condition isn't uniform across channels of the same SIMD
thread.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Currently only the physical back-edge is represented, which
incidentally also leads to the exit block of the loop, but we need the
direct logical edge in addition for our logical CFG representation to
be complete.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This represents two control flow graphs in the same cfg_t data
structure: The physical CFG that will include all possible control
flow paths the EU can physically take, and the logical CFG restricted
to the control flow paths that exist in the original scalar program.
The latter is a subset of the former because in case of divergence the
SIMD vectorized program will take control flow paths that aren't part
of the original scalar program.
The bblock_link constructor and bblock_t::add_successor() now take a
"kind" parameter that specifies whether the edge is purely physical or
whether it's part of both the logical and physical CFGs (a logical
edge is of course always guaranteed to be in the physical CFG as
well). bblock_t::is_predecessor_of() and ::is_successor_of() also
take a kind parameter specifying which CFG is being queried. The '~>'
notation will be used now in order to represent purely physical edges
in IR dumps.
This commit doesn't actually add nor remove any edges from the CFG
(the only edges marked as purely physical here are the two WHILE loop
ones that already existed). Optimization passes should continue using
the same (incomplete) physical CFG they were using before until
they're fixed to do something smarter in a later commit, so this
shouldn't lead to any functional changes.
v2: Remove tabs from lines changed in this file (Caio).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Having the IR opcodes locked to their hardware representation is risky
because it causes opcodes as different as BRC and IFF to compare equal
at the IR level (luckily the back-end only ever uses one opcode from
each group, right now), and it prevents us from supporting
instructions that change their hardware representation across
generations, which will become a problem on Gen12+ platforms.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Change brw_inst_set_opcode() and brw_inst_opcode() to call
brw_opcode_encode/decode() transparently in order to translate between
hardware and IR opcodes, and update the EU compaction code in order to
do the same as needed, so we can eventually drop the one-to-one
correspondence between hardware and IR opcodes.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This rewrites the current opcode description tables as a more compact
flat data structure. The purpose is to allow efficient constant-time
look-up by either HW or IR opcode, which will allow us to drop the
hard-coded correspondence between HW and IR opcodes -- See the next
commits for the rationale.
brw_eu.c is now built as C++ source so we can take advantage of
pointers to member in order to make the look-up function work
regardless of the opcode_desc member used as look-up key.
v2: Optimize devinfo struct comparison (Caio)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The brw_inst opcode accessors are going away in one of the following
commits. We could potentially replace them with the new helpers that
do opcode remapping, but that would lead to a circular dependency
between brw_inst.h and brw_eu.h. This way we also avoid ordering
issues that can cause the semantics of the ex_desc accessors to change
depending on whether the ex_desc field is set after or before the
opcode instruction field.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is required because SEND message payload sources are fetched
asynchronously by the hardware, which can lead to WaR data corruption
on Gen12+ platforms if not handled specially by the compiler to
guarantee proper synchronization.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
so that drivers don't have to call nir_strip manually.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Create basic aub_context on GEM_CONTEXT_CREATE.
Set it up and submit a context + ring + pphwsp during execbuf
submission, if it has not been initialized yet.
v2: Write the HWSP only once per engine (Lionel).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v2:
- Only dump context if there were no erros (Lionel).
- Store counter for context handles in aub_file (Lionel).
v3:
- Add a comment about aub_context -> GEM context (Lionel).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We want to be able to create contexts on demand, and increase the GGTT
as needed for that. Use the aub_map_ggtt() function for that.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We want to reuse it in execlists_setup().
v2: Rename it to write_ggtt_ptes() (Lionel).
v3: Rename it to aub_map_ggtt() (Lionel).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This fixes a case where user first creates image and then later binds it
with memory created from AHW buffer.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Just following the spec. Somewhat unclear whether this applies to NULL
surfaces.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It appears we never had a test in piglit or deqp sampling from a null
surface...
It turns out this triggers a hang on IVB only. Updating the null
surface format to R32_UINT fixes the hang on ivb and doesn't affect
other platforms, so set it by default for all platforms.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1872
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're missing the offset of the slice in the subslice mask...
This worked for most platforms that don't have first slice fused off
because we would reread the same mask from slice0 again and again...
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c1900f5b0f ("intel: devinfo: add helper functions to fill fusing masks values")
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1869
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Not much to do to enable this, just make sure to always write to the
GGTT :)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On 64 bits platforms, some atomic operations like __sync_fetch_and_add()
have constant time, but on 32 bits platforms they are implemented with a
loop and might take much longer.
Additionally, it seems like if their operands are not aligned to 64
bits, they also require extra memory accesses. From the Intel
Architecture's Developer Manual Vol. 1, 4.1.1:
"A word or doubleword operand that crosses a 4-byte boundary or a
quadword operand that crosses an 8-byte boundary is considered
unaligned and requires two separate memory bus cycles for access."
Forcing the u64 field to be aligned to 64 bits seems to make the unit
tests that are stressing this finish much faster.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v1 by Topi Pohjolainen
v2,v3 by Anuj Phogat:
- Apply for gen >= 11
- Remove wa_bug_xxx function
- Use helper functions
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i915 will report ENODEV on generations prior to Haswell because there
is no point in reporting values on those. This is prior any fusing
could happen on parts with identical PCI ids.
This query call was previously only triggered on generations that
support performance queries, which happens to match generation for
which i915 reports topology, but the commit pointed below started
using it on all generations.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1860
Cc: <mesa-stable@lists.freedesktop.org>
Fixes: 96e1c945f2 ("i965: Move device info initialization to common code")
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
The order of comparison has changed, so we need to invert the logic of
"insert_left" when using rb_tree_insert_at().
Fixes: dae33052db (util/rb_tree: Reverse the order of comparison
functions).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Without this, we were DCEing flag writes because we didn't think their
results were used because we didn't understand that an ANY32 predicate
actually read all the flags.
Fixes: df1aec763e "i965/fs: Define methods to calculate the flag..."
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes invalid close(-1) in the unit tests.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
flrp was forgotten when already adding the rounding mode for other
instructions.
Fixes: ba1e25e1aa ("i965/fs: set rounding mode when emitting fadd, fmul and ffma instructions")
Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
After
1711bf6cf2 ("intel/fs: Generate better code for fsign multiplied by a value"),
the conflicts resolution for setting the rounding mode after the
fused fmul and fsign optimization is non obvious.
Basically, the optimization doesn't really result in a MUL, or any
other operation which would need to have the rounding mode set. Hence,
we set it just before the actual MUL in the treatment of fmul.
Fixes: ba1e25e1aa ("i965/fs: set rounding mode when emitting fadd, fmul and ffma instructions")
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>