CoverityID: 1487496
Fixes: cde9ca616d "intel/compiler: Make decision based on source type instead of opcode"
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11985>
This patch restructure code a little bit to check if source can be
represented as immediate operand. This is a foundation for next patch
which add checks for integer operand as well.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11596>
This is distinct form max_cs_threads because it also encodes
restrictions about the way we use GPGPU/COMPUTE_WALKER. This gets rid
of the MIN2(64, devinfo->max_cs_threads) we have scattered all over the
driver and puts it in a central place.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11861>
No difference proven at 95.0% confidence (n=10) in
dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13.
v2: Only update each block's IP data once instead of once per block.
Suggested by Emma.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11632>
Performance improvement in
dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 for n=30:
release build (w/Fedora build flags): -0.82% ± 0.23%
Meson -Dbuildtype=debugoptimized: -0.74% ± 0.27%
The difference in the debugoptimized build is the calls to
inst_is_in_block(block, this) still exist on each call to remove().
v2: Only update each block's IP data once instead of once per block.
Suggested by Emma.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11632>
Performance improvement in
dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 for n=30:
release build (w/Fedora build flags): -7.79% ± 0.25%
Meson -Dbuildtype=debugoptimized: -5.10% ± 0.40%
The difference in the debugoptimized build is the calls to
inst_is_in_block(block, this) still exist on each call to remove().
v2: Only update each block's IP data once instead of once per block.
Suggested by Emma.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11632>
We say that they're for debug only but we don't really have a good
policy around when to set them and when not to. In particular,
nir_lower_system_values and nir_lower_vars_to_ssa which are the chief
producers of SSA values which might reasonably have a name do not bother
to set one. We have some names set from things like BLORP and RADV's
meta shaders but AFAICT, they're setting a name more because it's there
than because they actually care.
Also, most things other than nir_clone and nir_serialize don't bother to
try and preserve them. You can see in the diffstat of this commit
exactly what passes attempt to preserve names. Notably missing from the
list is opt_algebraic which is the single largest source of SSA def
churn and it happily throws names away.
These observations lead me to question whether or not names are actually
useful at all or if they're just taking up space (8B per instruction)
and wasting CPU cycles (to ralloc_strdup on the off chance we do have
one). I don't think I can think of a single time in recent history
where I've been debugging a shader issue and a SSA value name has been
there and been useful. If anything, the few times they are there, they
just throw me off because they mess up the indentation in nir_print.
iris shader-db on my system gets runtime -2.07734% +/- 1.26933% (n=5)
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5439>
Qualcomm has a mode with a subgroup size of 128, so just emitting larger
integer operations and then lowering them later isn't an option. This
makes the pass able to handle the lowering itself, so that we don't have
to go down to 64-thread wavefronts when ballots are used.
(The GLSL and legacy SPIR-V extensions only support a maximum of 64
threads, but I guess we'll cross that bridge when we come to it...)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
We set the ex_desc to 0, since the address surface type is FLAT.
v2 (Sagar Ghuge):
- Fix message descriptor encoding
v2 (Jason Ekstrand):
- Drop support for block messages
Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com>
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
Bspec programming note metions that "Atomic messages are always forced
to "un-cacheable" in the L1 cache". We can make the L1 cache
un-cacheable and L3 with write-back policy.
v2: (Sagar Ghuge):
- Fix caching policy for atomic messages
- Fix simd exec size
v3: (Sagar Ghuge):
- Add atomic messages to brw_schedule_instructions
v4: (Jason Ekstrand):
- Rebase on lsc_msg_desc reworks
Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com>
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
This puts the basic infrastructure in place for lowering logical
dataport messages to LSC messages. We start with the two most obvious
opcodes and add more in later patches.
v2 (Sagar Ghuge):
- Pass required params to message desc
- Remove duplicate mlen calculation
- Change commit message.
v3 (Jason Ekstrand):
- Drop TGM support
Co-authored-by: Jason Ekstrand <mark.a.janes@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
v2 (Jason Ekstrand):
- Use lsc_msg_desc_opcode()
- Drop all opcodes for now and add them in later patches.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
v2: (Sagar Ghuge):
- rename addr_reg_size to src0_len to match with bspec
v3 (Jason Ekstrand):
- Re-arrange things in increasing bit order
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
v2 (Jason Ekstrand):
- Squash all the similar patches together
- Re-arrange and rename some things to be more consistent
- Add a lsc_opcode_has_cmask helper
- Drop is_one_addr_reg
v3 (Jason Ekstrand):
- Add transpose
- Re-order arguments to make more logical sense
- Switch from `write` to `has_dest`
Co-authored-by: Mark Janes <mark.a.janes@intel.com>
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
Xe-HPG comes with a massively reworked dataport. The new thing, called
Load/Store Cache or LSC, has a significantly improved interface.
Instead of bespoke messages for every case, there's basically one or two
messages with different bits to control things like address size, how
much data is read/written, etc. It's way nicer but also means we get to
rewrite all our dataport encoding/decoding code. This patch kicks off
the party with all of the new enums.
v2 (Jason Ekstrand, Mark Janes):
- Rename to LSC
v3 (Jason Ekstrand):
- Add numbers to all enums
Co-authored-by: Mark Janes <mark.a.janes@intel.com>
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
Surely the compiler would sort that out, you would think. But no, my
debugoptimized build improves
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 runtime by 25%
on my SKL from this change.
This was the slowest test in the GLES31 tests on APL in CI, at 22s. And
yes, we were spending around half of our runtime in this function.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11631>
"regs" is an array of 2 ->
"m" must be <= 2 ->
"components" array can be allocated on the stack
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11575>
Right now the accumulator-clearing move emitted by the generator for
Wa_14010017096 inherits the SWSB field from the previous instruction.
This can lead to redundant synchronization, or possibly more serious
issues if the previous instruction had a TGL_SBID_SET SWSB
synchronization mode. Take the SWSB synchronization information from
the IR.
Fixes: a27542c5dd ("intel/compiler: Clear accumulator register before EOT")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11433>
This is unlikely to have had any negative side effect on the original
TGL, but will lead to issues on XeHP+ if the software scoreboard pass
isn't able to synchronize the accumulator writes.
Fixes: a27542c5dd ("intel/compiler: Clear accumulator register before EOT")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11433>
In cases where an in-order instruction is overwriting a register
previously read by another in-order instruction, drop the dependency
iff the previous read is guaranteed to have occurred from the same
in-order pipeline. This should only have an effect on XeHP+ since
previous Xe platforms only had one in-order FPU pipeline.
The previous workaround we were using for this treated all ordered
read dependencies as write dependencies to avoid noise from our
simulation environment. Relative to our previous workaround this
improves performance of GFXBench5 gl_tess by ~7% on a DG2 system
among other single-digit percentual FPS improvements.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11433>