This more coarse-grained hash information for compiler (vs. full
devinfo), used only by Iris and Anv, and relevant for more recent
platforms. Remove it from elk.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27563>
For now is not linked to any driver. The tools were renamed to use elk
prefix to avoid conflicting with the brw ones. The run-test.py script
was also updated due to that change.
Before the new compiler can be linked together with the old (going to be
done for Iris and other tools), the symbol conflicts need to be fixed
first. This will happen in a later commit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27563>
In the past, we didn't have a good solution for combining scalar loads
with a variable index plus a constant offset. To handle that, we took
our load offset and rounded it down to the nearest vec4, loaded an
entire vec4, and trusted in the backend CSE pass to detect loads from
the same address and remove redundant ones.
These days, nir_opt_load_store_vectorize() does a good job of taking
those scalar loads and combining them into vector loads for us, so we
no longer need to do this trick. In fact, it can be better not to:
our offset need only be 4 byte (scalar) aligned, but we were making it
16 byte (vec4) aligned. So if you wanted to load an unaligned vec2,
we might actually load two vec4's (___X | Y___) instead of doing a
single load at the starting offset.
This should also reduce the work the backend CSE pass has to do,
since we just emit a single VARYING_PULL_CONSTANT_LOAD instead of 4.
shader-db results on Alchemist:
- No changes in SEND count or spills/fills
- Instructions: helped 95, hurt 100, +/- 1-3 instructions
- Cycles: helped 3411 hurt 1868, -0.01% (-0.28% in affected)
- SIMD32: gained 5, lost 3
fossil-db results on Alchemist:
- Instrs: 161381427 -> 161384130 (+0.00%); split: -0.00%, +0.00%
- Cycles: 14258305873 -> 14145884365 (-0.79%); split: -0.95%, +0.16%
- SIMD32: Gained 42, lost 26
- Totals from 56285 (8.63% of 652236) affected shaders:
- Instrs: 13318308 -> 13321011 (+0.02%); split: -0.01%, +0.03%
- Cycles: 7464985282 -> 7352563774 (-1.51%); split: -1.82%, +0.31%
From this we can see that we aren't doing more loads than before
and the change is pretty inconsequential, but it requires less
optimizing to produce similar results.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27568>
With the advent of software scoreboarding, we emit sync instructions
in various places to synchronize the execution pipelines. This results
in assembly being littered with a bunch of sync.nop instructions. That
means that when you reorder anything in the program, the scoreboarding
changes, and the number of sync.nops can vary wildly - even if the code
isn't really materially better or worse. This makes it hard to use
tools like shader-db or fossil-db on post-Icelake platforms.
For now, exclude sync.nops from the instruction count statistic. One
day we may want to consider improving the software scoreboarding pass
to emit fewer redundant sync.nop instructions, at which point tracking
this as a separate stat might be useful. For now though, it's simply
cluttering and confusing our results.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27701>
There are a couple mistakes here :
- using a bitfield as an index to generate a bitfield...
- in anv_nir_push_desc_ubo_fully_promoted(), confusing binding
table access of the descriptor buffer with actual descriptors
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ff91c5ca42 ("anv: add analysis for push descriptor uses and store it in shader cache")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27504>
Currently, Intel's shader cache incorporates PCI ID into shader cache
keys.
Many devices with different PCI IDs have identical shader compilation
functionality. Using PCI ID as a component of the shader cache hash
means that a multi-platform shader cache will have redundant,
identical entries for similar platforms.
All Intel compiler functionality is selected based on device
configuration in `struct intel_device_info`. intel_device_info.py
flags all fields accessed by intel/compiler.
This commit generates a hash function incorporating intel/compiler
device info fields. Using this hash function in place of PCI ID will
produce a multiplatform cache with no duplicated content.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26844>
And move them inside the compiler since they (especially asm) rely on
a bunch of internal types.
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27579>
This will be used to generate a serialized NIR of functions for
internal shaders.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>
We have to push the lowering of texture operations a bit further in
pipeline since nir_lower_tex gets invoked twice and if there is no LOD
source present, nir_lower_tex adds that as a source. Once that's all
done we can easily combine the LOD and array index into a single 32-bit
value.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27458>
Since this lowering is totally Intel specific, we don't have to
introduce the new texture source. We can use the nir_tex_src_backend1
source to pack LOD/LOD Bias and array index into 32 bit single value.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27458>
This pass combines the LOD or LOD bias and array index into a single
32-bit value since Xe2+ sampler messages requires us to do that.
v2: (Alyssa)
- Use nir_iand_imm instead of nir_iand and nir_imm_int
- Use nir_trim_vector instead of nir_swizzle
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27458>
Ref: bspec 55414
Ref: 951e08fc18 ("intel/compiler: Disable DPAS instructions on MTL")
Suggested-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27352>
Since SIMD8 no longer exists, the SIMD modes enums have different names
and different values.
v2 (Francisco Jerez): Rebase on 07b9bfacc7 ("intel/compiler: Move
logical-send lowering to a separate file").
v3: Update brw_disasm.c with SIMD descriptions.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
The extra assertions are just there to help validate
pack_lod_and_array_index (in nir_lower_tex.c).
v2: Split got_lod_or_bias into two variables. This simplifies some
changes that Sagar is working on. Suggested by Sagar.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
Note: a future commit will expand the sampler message type to the 6 bits
used on Xe2.
v2 (Francisco Jerez): Rebase on 07b9bfacc7 ("intel/compiler: Move
logical-send lowering to a separate file").
v3: Drop XE2_SAMPLER_MESSAGE_SAMPLE_BIAS_MLOD as it does not actually
exist. This resulted in some bigger changes in brw_disasm.c. Noticed
by Sagar.
v4: Now that XE2_SAMPLER_MESSAGE_SAMPLE_MLODc conflicts with
GFX7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C, the determination of
min_lod_is_first must include devinfo->ver or previous platforms will
break.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
Message types are expanded to 6-bit encoding now. 5 bits are still the
same field from the Sampler Message Descriptor. The most significant bit
is now bit 31 of the Sampler Message Descriptor. The messages that have
'1 in bit 6 are only to support programmable offsets and those would
require message header. If a sampler type shows only 5 bits encoding, it
is implied bit 6 equal to 0 and there is no requirement for header.
v2 (idr): Trivial formatting changes.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
This small refactor simplifies a later commit that will optionally emit
some opcodes before the switch (as is already done with the shadow
comparitor).
v2 (Francisco Jerez): Rebase on 07b9bfacc7 ("intel/compiler: Move
logical-send lowering to a separate file").
v3 (Jordan): SHADER_OPCODE_TXL => SHADER_OPCODE_TXL_LZ (was
SHADER_OPCODE_TXF_LZ).
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
The Bspec also says, "The table below describes the SIMD modes which
are supported. SIMD32 and SIMD64 are used for media-type operations
only." Perhaps this commit should just add
if (devinfo->ver >= 20)
return 16;
instead.
v2: Use reg_unit in get_sampler_lowered_simd_width. Suggested by Sagar.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
The optimization pass will (eventually) turn the imul into a
umul_32x16. In many cases, the multiply will be converted to something
else.
I also tried cloning a bunch of existing imul algebraic patterns for
[iu]mul_32x16. This produced the same result, but it was a lot more
churn.
All of the shaders affected were ray tracing shaders in Q2RTX. This is
the only ray tracing workload in my fossil-db.
DG2
Totals:
Instrs: 191995626 -> 191995079 (-0.00%); split: -0.00%, +0.00%
Cycles: 14003803561 -> 14003798040 (-0.00%); split: -0.00%, +0.00%
Spill count: 108320 -> 108288 (-0.03%)
Fill count: 200695 -> 200663 (-0.02%)
Scratch Memory Size: 8755200 -> 8754176 (-0.01%)
Totals from 7 (0.00% of 652118) affected shaders:
Instrs: 14998 -> 14451 (-3.65%); split: -3.94%, +0.29%
Cycles: 137222 -> 131701 (-4.02%); split: -4.10%, +0.07%
Spill count: 32 -> 0 (-inf%)
Fill count: 32 -> 0 (-inf%)
Scratch Memory Size: 19456 -> 18432 (-5.26%)
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27161>
This adds a few new fields in the brw_cs_prog_data struct and then
uses them to fill in the relevant COMPUTE_WALKER fields.
Although the Tile Layout field theoretically has different settings for
32/64/128bpe, it appears that the recommended programming is to always
pick either TileY 32bpe or Linear. It's not very practical to look at
the surface formats involved, anyway.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27167>