Crocus lowers this in the frontend, they key member is still used
but reset prior to backend.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14202>
There's nothing in bspec that would suggest this is still needed.
It only affected gfx 9 and 10.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14280>
In all three cases, COMPUTE was on the table but with an invalid
value (zero). Drop it from the tables and the extra assertion, so if
a COMPUTE is passed it will just fail the ARRAY_SIZE assertion.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14274>
Instead of masking the dirty variable itself, use an appropriate mask
in the users of dirty. This will avoid extra tracking when dealing
with Task/Mesh later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14275>
This adds a missing CFG edge that represents a possible physical
control flow path the EU might take under some conditions which isn't
part of the logical CFG of the program. This possibility shouldn't
have led to problems on platforms prior to Gfx12, since the missing
control flow edge cannot possibly influence liveness intervals.
However on Gfx12+ it becomes the compiler's responsibility to resolve
data dependencies across instructions, and the missing physical
control flow paths may lead to a WaR data hazard currently not visible
to the software scoreboard pass, which could lead to data corruption.
Worse, the possibility for this path to be taken by the EU increases
on Gfx12+ due to a hardware bug affecting EU fusion -- However the
same physical path can be potentially taken on earlier platforms as
well, so this patch extends the CFG on all platforms for consistency,
even though the lack of this edge shouldn't lead to any functional
issues on platforms earlier than Gfx12. There are no shader-db
changes on earlier platforms, so there seems to be no disadvantage
from using the same CFG representation as on later platforms.
This issue has ben reported on TGL with the following conformance
test, thanks to Ian for bringing the FULSIM dependency check warning
to my attention:
dEQP-VK.graphicsfuzz.spv-stable-pillars-volatile-nontemporal-store
Fixes: 4d1959e693 ("intel/cfg: Represent divergent control flow paths caused by non-uniform loop execution.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4940
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Reported-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14248>
On XeHP+, Binding Table Pointers are an offset relative to the Surface
State Base Address anymore. Instead, they are relative to the State
Binding Table Pool Address, which is set by the command above.
We emit that command (pointing to the same address as the Surface
State Base Addresss), and everything should stay working as before.
Reworks:
* Jordan: Add iris
* Jordan: Drop i965
* Ken: Set MOCS to avoid a major perf impact. (Found by Felix DeGrood.)
* Jordan: Shrink size from 2MiB to actual iris, anv usage
* Lionel: Add BINDING_TABLE_POOL_BLOCK_SIZE
Ref: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4995
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
[jordan.l.justen@intel.com: Add Iris, adjust sizes]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13992>
While our LIFO scheduling mode attempts to optimize for register
pressure, it's often hard for a scheduling algorithm to do better than
the instruction order provided by the shader author. Shader authors
often do perfectly reasonable things like using texture results
immediately after fetching them or constructing texture coordinates
immediately before the texture op. When we throw all the instruction
ordering information away, we loose any help the author may have given
us. By attempting NONE before we fall back to the worst case LIFO mode.
And, yes, I tried this with NONE both before and after LIFO and doing
NONE before LIFO is substantially better, according to shader-db.
total instructions in shared programs: 19673152 -> 19665202 (-0.04%)
instructions in affected programs: 33669 -> 25719 (-23.61%)
helped: 20
HURT: 0
helped stats (abs) min: 15 max: 4609 x̄: 397.50 x̃: 107
helped stats (rel) min: 2.33% max: 67.50% x̄: 14.60% x̃: 9.12%
95% mean confidence interval for instructions value: -867.61 72.61
95% mean confidence interval for instructions %-change: -21.74% -7.46%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 935562500 -> 935020920 (-0.06%)
cycles in affected programs: 18620349 -> 18078769 (-2.91%)
helped: 104
HURT: 48
helped stats (abs) min: 88 max: 60986 x̄: 8031.48 x̃: 3680
helped stats (rel) min: 0.61% max: 51.44% x̄: 14.95% x̃: 8.87%
HURT stats (abs) min: 10 max: 54724 x̄: 6118.62 x̃: 1530
HURT stats (rel) min: 0.13% max: 46.45% x̄: 10.28% x̃: 6.46%
95% mean confidence interval for cycles value: -5724.34 -1401.71
95% mean confidence interval for cycles %-change: -9.86% -4.10%
Cycles are helped.
total spills in shared programs: 12158 -> 10327 (-15.06%)
spills in affected programs: 1831 -> 0
helped: 20
HURT: 0
total fills in shared programs: 14749 -> 12635 (-14.33%)
fills in affected programs: 2114 -> 0
helped: 20
HURT: 0
LOST: 8
GAINED: 649
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
The way the current scheduler loop is implemented, each scheduling pass
starts with what the previous pass had. This means that, if PRE screwed
everything up majorly, PRE_NON_LIFO would have to try to fix it. It
also meant that tiny changes to one pass would affect every later pass.
Instead, reset the order of the instructions before each scheduling
pass. This makes the passes entirely independent of each other.
Shader-db results on Ice Lake:
total instructions in shared programs: 19670486 -> 19670648 (<.01%)
instructions in affected programs: 25317 -> 25479 (0.64%)
helped: 2
HURT: 7
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 0.07% max: 0.07% x̄: 0.07% x̃: 0.07%
HURT stats (abs) min: 8 max: 70 x̄: 24.29 x̃: 12
HURT stats (rel) min: 0.41% max: 4.95% x̄: 1.47% x̃: 0.87%
95% mean confidence interval for instructions value: -1.28 37.28
95% mean confidence interval for instructions %-change: -0.04% 2.30%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 935535948 -> 935490243 (<.01%)
cycles in affected programs: 421994824 -> 421949119 (-0.01%)
helped: 1269
HURT: 879
helped stats (abs) min: 1 max: 12008 x̄: 259.38 x̃: 52
helped stats (rel) min: <.01% max: 28.02% x̄: 1.12% x̃: 0.14%
HURT stats (abs) min: 1 max: 29931 x̄: 322.46 x̃: 20
HURT stats (rel) min: <.01% max: 32.17% x̄: 1.74% x̃: 0.22%
95% mean confidence interval for cycles value: -71.37 28.81
95% mean confidence interval for cycles %-change: -0.11% 0.21%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 12403 -> 12430 (0.22%)
spills in affected programs: 1355 -> 1382 (1.99%)
helped: 2
HURT: 7
total fills in shared programs: 15128 -> 15182 (0.36%)
fills in affected programs: 3294 -> 3348 (1.64%)
helped: 2
HURT: 7
LOST: 21
GAINED: 28
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
This reverts commit ba2fa1ceaf. Doing
optimizations after scheduling but before RA means doing them in the
middle of the scheduling loop which introduces additional dependencies
between one scheduling iteration and the next. That won't work if we
want to make the scheduling modes independent, at least not unless we
have some way of fully cloning the IR.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
brw_find_next_block_end() scans through the instructions to find the end
of the block. We were calling it for every instruction in the program
which is, if you have a single basic block, makes the whole mess a nice
clean O(n^2) when it really doesn't need to be. Instead, only call
brw_find_next_block_end() as-needed. This brings it back to O(n) like
it should have been.
This cuts the runtime of the following Vulkan CTS on my SKL box by 5%
from 1:51 to 1:45: dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
Now that we're being conservative in the pass, it's easy to tell when it
makes progress and we can put it in the OPT() macro. This way, we get
nice INTEL_DEBUG=optimizer dumps for it. While we're here, fix the
header comment which is massively out-of-date.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
Instead of modifying every single instruction, keep track of which VGRFs
are actually split in a bit-set, and only modify the instructions that
actually touch split regs.
This cuts the runtime of the following Vulkan CTS on my SKL box by 45%
from 3:21 to 1:51: dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
Putting it in the pipeline is a bit of a lie. We no longer need it for
nir_lower_wpos_center. The only other user is pipeline_has_coarse_pixel
and that is used to build the shader key which we construct before we've
processed any NIR so we don't have accurate information at that time
anyway. Instead, look at ms_info->sampleShadingEnable directly in
pipeline_has_coarse_pixel and trust the back-end to deal with disabling
coarse when we need per-sample dispatch.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>
This rolls compute_sample_position into emit_samplepos_setup, its only
caller, by using a loop instead of calling it twice. We also
early-return for the !persample_dispatch case instead of doing it as
part of the sample calculation. This means that we don't call
fetch_payload_reg() to get sample_pos_reg unless we're actually going to
use it so the function is safe to call even if we haven't set up
sample_pos_reg. This will be important for the next commit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>
Rather than using 2 vfuncs, use one since we've unified the
synchronization framework in the runtime with a single vk_sync object.
v2 (Jason Ekstrand):
- create_sync_for_memory is now in vk_device
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14237>
This is necessary to use intel_stub_gpu with Crocus.
v2: Remove unused i915_bo::swizzle_mode. Noticed by Emma.
Fixes: 953a4ca6fe ("intel: Add has_bit6_swizzle to devinfo")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14218>
This is by far the more common style in Mesa. It also gives a cue that
e.g. num_dependency_ids is a fixed definition rather than some kind of
local variable maintaining a count.
While hre, we also rename the enums to have full prefixes to prepare for
a future where we use them in multiple files for future backend work.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14182>
This replaces a bunch of uses of the GLSL IR ir_texture_opcode enum with
the backend opcode, in preparation for removing it altogether.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14191>
Rework:
* Jordan: Set MOCS after
7b78b2fcac ("intel/genxml: Assert that all MOCS fields are non-zero on Gfx7+")
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14212>
Having an integer destination type instead of a float destination type
confuses the SWSB code. This causes problems on some Intel GPUs. Fix
this by using the correct type in the destination of the F32TOF16
opcode.
Gfx7 doesn't have the HF type, so continue to emit W on that platform.
The assertions in brw_F32TO16 (brw_eu_emit.c) are updated to reflect
this. In scalar mode, UD is never emitted as a destination type for
this opcode, so remove it from the allowed types in the assertion.
I also condidered doing something like de55fd358f ("intel/fs/xehp:
Teach SWSB pass about the exec pipeline of
FS_OPCODE_PACK_HALF_2x16_SPLIT."), but Curro recommended that just using
the correct types is a better fix. I agree.
v2: Add missing changes to fs_generator::generate_pack_half_2x16_split.
I'm not sure how I (and the Intel CI) missed that the first time. :(
v3: Fix copy-and-paste issue in the v2 fix. Noticed by Tapani.
Reviewed-by: Francisco Jerez <currojerez@riseup.net> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14181>
Based on Rafael's:
* "nir/lower_tex: Add option to lower offset for tg4 too."
* "intel/compiler: Lower offsets for tg4 on gen9+."
* "WIP: Do not lower basic offsets."
* "WIP: intel/compiler: Enable lowering offsets restriction."
But, with these changes:
* Fixed range checking to be signed 4 bits
* Converted to filter
* Apply only to gfx12.5+
* Use nir_src_is_const / nir_src_comp_as_int (s-b Jason)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14142>
We were setting anv_pipeline::sample_shading_enable based on
sampleShadingEnable without looking at minSampleShading. We would then
pass this value into nir_lower_wpos_center which would add sample_pos to
frag_coord. Then the back-end compiler picks up on the existence of
sample_pos and forces persample dispatch. This leads to doing
per-sample dispatch whenever sampleShadingEnable = VK_TRUE regardless of
the value of minSampleShading. This is almost certainly costing us
perf somewhere.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14022>
The alignment and scale down values have changed on this platform.
To support drivers that won't use a CCS surface on this platform, this
patch computes the CCS fast clear rectangle using the main surface.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13555>
According to Bspec 2424, "Render Target Resolve":
The Resolve Rectangle size is same as Clear Rectangle size from SKL+.
Use get_fast_clear_rect in blorp_ccs_resolve for SKL+.
Note that the Bspec differs from Vol7 of the Sky Lake PRM, which only
specifies aligning by the scaledown factors.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13555>
The comment states that 64K alignment of surfaces is required when an
aux map is present on the platform. However, the code checks for GFX12
instead of dev->info->has_aux_map. Update the code to match the comment.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13555>