Iris only runs on BDW+ and ANV already handles this by not even trying
on anything older than HSW. The only driver benefiting from this common
check is i965. Moving it out makes the pass more generic and if some
driver comes along which can push UBOs on IVB, it should work for that.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11145>
On XeHP there are restrictions on types of source and destinations
with float types. As shuffle is implemented using MOV we need to make
sure we lower it to supported types.
This fixes tests like :
dEQP-VK.subgroups.arithmetic.framebuffer.subgroupexclusivemax_vec4_vertex
dEQP-VK.subgroups.arithmetic.framebuffer.subgroupexclusivemul_f16vec3_vertex
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10902>
Fixes perf regression introduced from tileY LID order for CS
shaders that access both textures and buffers. Walks LIDs in
X-major fashion, but with blocks of height 4. This maps LIDs per
HW thread for SIMD8/16/32 as (2x4/4x4/8x4), which is always good
for tileY resources and usually good for linear resources.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10733>
Computer shaders that access tileY resources (textures) benefit
from Y-locality accesses. Easiest way to implement this is walk
local ids in Y-major fashion, instead of X-major fashion. Y-major
local ids will reduce partial writes and increase cache locality
for tileY accesses since tileY resources cachelines progress in
Y direction.
Improves performance on TGL:
Borderlands3.dxvk-g2 +1.5%
Y-major can introduce a performance drop on CS that use mixture
of buffers and images. This should be fixed in next commit.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10733>
This is the vec4 equivalent of d0d039a4d3, required for proper UBO
pushing in vertex stages for Vulkan on HSW. Sadly, the implementation
requires us to do everything in ALIGN1 mode and the vec4 instruction
scheduler doesn't understand HW_GRF <-> UNIFORM interference so it's
easier to do the whole thing in the generator. We add an instruction
to the top of the program which just means "emit the blob" and all the
magic happens in codegen.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10571>
In order to avoid switching pull constants to push constants and then
having to back to pull, compute the push ranges up-front. This way we
know by the time we emit code exactly what ranges are pushable. This is
a bit inefficient in the case where the "normal" push constants get
compacted. However, most apps don't use giant piles of dead uniforms
combined with substantial UBO use so this should be ok.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10571>
The way we handle spilling for fp64 in vec4 is to emit a series of MOVs
which swizzles the data around and then a pair of 32-bit spills. This
works great except that the next time we go to pick a spill reg, the
compiler isn't smart enough to figure out that the register has already
been spilled. Normally we do this by looking at the sources of spill
instructions (or destinations of fills) but, because it's separated from
the actual value by a MOV, we can't see it. This commit adds a new
opcode VEC4_OPCODE_MOV_FOR_SCRATCH which is identical to MOV in
semantics except that it lets RA know not to spill again.
Fixes: 82c69426a5 "i965/vec4: support basic spilling of 64-bit registers"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10571>
We don't want to have to deal with vector phis in freedreno, because
vectors are always split/unsplit around vectorized instructions anyways,
and the stated reason for not scalarising them (it hurting coalescing)
won't apply to us because we won't be using nir_from_ssa. Add this
option so that we don't have to do the equivalent thing while
translating from NIR.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10809>
FS will get the last geometry VUE, but it still needs to recompute in
case the number of position slots assigned by geometry is larger than
one -- this happens when Primitive Replication is used.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10653>
This is most likely a rebase mistake :(
Fixes: f3e5cd813a ("intel/fs: Handle regioning restrictions of split FP/DP pipelines.")
Ref: aa53665fda ("intel/fs/copy_prop: check stride constraints with actual final type")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10764>
Now that all drivers are using brw_cs_get_dispatch_info() we can
remove one function (which is now unused) and reduce the scope of the
other.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10504>
We have this small calculations repeated in each Intel driver, so move
them to a single place to be reused. Also includes "right_mask" since
is always used in the same context and depends on the dispatch info
values.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10504>
There's a recently discovered HW bug affecting hardware at least as far
back as Skylake where, if the LOD is out-of-bounds for any SIMD lane,
then garbage may be returned in all SIMD lanes. The easy solution is to
set lower_txs_lod so that we always have a constant LOD of 0 which we
know a priori is always in-bounds. Fortunately, not many shaders
actually use textureSize() with LOD.
Shader-db results on Ice Lake:
total instructions in shared programs: 19948537 -> 19948564 (<.01%)
instructions in affected programs: 3859 -> 3886 (0.70%)
helped: 0
HURT: 7
One of the shaders is in Civilization: Beyond Earth, and the rest are
all in Civilization VI.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10538>
The source_depth_to_render_target flag can get set on old gen4-5 HW in a
few cases which are independent of the app writing gl_FragDepth. It
should be safe to just use fetch_payload_reg in that case instead of
depending in interpolation setup. This fixes a bug with certain very
simple shaders where we might end up not including the depth when we
should have.
While we're here, rework the logic around setting src_depth and add a
comment so it's more clear what's going on.
Fixes: 6d4070f3dd "intel/compiler: add support for fragment coordinate..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10596>
There were two bugs which crep in here as part of 64551610d1:
forgetting that exec sizes in HW are in log2 space and having the
exec_size condition for the subtype backwards.
Fixes: 64551610d1 "intel/compiler: rework message descriptors..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10588>
Those helpers exist primarily to sort out some of the weirdness around
Gen4-6 dataport access. On Gen5 and earlier, everything was called
"dataport" and, instead of the SFID we have today there was a "target
cache" parameter in the descriptor. There are also some bits that moved
around on various gens depending on read vs. write. Starting with Gen6,
most things which target one of the data cache SFIDs should use
brw_dp_desc() instead.
v2: Drop backward comment (Ken)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
It's a Gen6 XFB thing. It's never used for anything else so there's no
point in having a target cache switch.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
v2: Drop new internal opcodes (Jason)
Simplify code (Jason)
v3: Add Z computation for coarse pixels
v4: Document things a little
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
v2: Use the new inst->ex_desc field (Jason)
v3: Drop CPS LoD compensation from sampler messages (Lionel)
v4: Drop useless uses_rate_shading (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
Render target message descriptors are slightly different from the
dataport ones. In particular the msg_type field is on bits 14:17 for
RT while bits 14:18 for DP.
v2: Drop unused send_commit_msg field in brw_fb_write_desc() (Ken)
v3: Rebase on top renaming (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
With the missing else, this prints the compacted hex followed by hex
for an uncompacted version of the compacted instruction. It also
doesn't print hex for instructions that are not compacted.
Fixes: bc4a127d6e ("intel/disasm: Label support in shader disassembly for UIP/JIP")
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4245
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10535>
Normally, we never see NULL in a source. However, starting with
eab1c55590, we can with a SHADER_OPCODE_SEND if it only has the first
payload. We were inserting barriers which adds unnecessary scheduling
dependencies and takes a lot of compile time because inserting a single
barrier is an O(n) operation.
All the extra O(n) can have a surprisingly large effect. This cuts the
runtime of dEQP-VK.binding_model.buffer_device_address.set3.depth3.
basessbo.convertcheckuv2.store.single.std140.frag by a factor of 20x for
a debug build.
Shader-db results on ICL:
total instructions in shared programs: 19918983 -> 19921610 (0.01%)
instructions in affected programs: 884074 -> 886701 (0.30%)
helped: 1688
HURT: 817
helped stats (abs) min: 1 max: 163 x̄: 4.23 x̃: 1
helped stats (rel) min: 0.02% max: 12.50% x̄: 1.08% x̃: 0.61%
HURT stats (abs) min: 1 max: 2674 x̄: 11.95 x̃: 2
HURT stats (rel) min: 0.11% max: 70.22% x̄: 1.71% x̃: 1.03%
95% mean confidence interval for instructions value: -1.97 4.06
95% mean confidence interval for instructions %-change: -0.28% -0.06%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 976503324 -> 975884809 (-0.06%)
cycles in affected programs: 82581703 -> 81963188 (-0.75%)
helped: 4144
HURT: 5010
helped stats (abs) min: 1 max: 79294 x̄: 311.31 x̃: 8
helped stats (rel) min: <.01% max: 53.69% x̄: 2.00% x̃: 0.51%
HURT stats (abs) min: 1 max: 92266 x̄: 134.04 x̃: 8
HURT stats (rel) min: <.01% max: 218.09% x̄: 3.25% x̃: 0.53%
95% mean confidence interval for cycles value: -119.85 -15.29
95% mean confidence interval for cycles %-change: 0.68% 1.07%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).
total spills in shared programs: 10659 -> 12014 (12.71%)
spills in affected programs: 441 -> 1796 (307.26%)
helped: 7
HURT: 12
total fills in shared programs: 11551 -> 14429 (24.92%)
fills in affected programs: 993 -> 3871 (289.83%)
helped: 8
HURT: 11
total sends in shared programs: 1025832 -> 1025353 (-0.05%)
sends in affected programs: 2241 -> 1762 (-21.37%)
helped: 105
HURT: 1
helped stats (abs) min: 1 max: 87 x̄: 4.57 x̃: 2
helped stats (rel) min: 5.56% max: 54.72% x̄: 11.37% x̃: 10.00%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for sends value: -7.39 -1.65
95% mean confidence interval for sends %-change: -12.95% -7.70%
Sends are helped.
LOST: 93
GAINED: 109
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4648
Fixes: eab1c55590 "intel/fs: Support SENDS in SHADER_OPCODE_SEND"
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10412>
This patch includes a number of reworks and fixes squashed in by
Nanley Chery, Sagar Ghuge, Jordan Justen and Francisco Jerez.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10000>