When a default value of a struct's field, which is in the
higher half of the first dword, is specified in a gen xml
file, setting op mask makes decoder treat the field as a
header (intel_field_is_header()). As a result, it won't
output the field in batch dump. This is not a common case
but can happen once a gen xml file includes such fields.
The op mask is only meaningful to instructions, so we fix
the above issue by not setting op mask of structs (also
registers).
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24268>
Before this patch, iris_resource_render_aux_usage would disable
compression when the clear color did not support format
reinterpretation.
With this patch, iris now replaces the clear color with zero and keeps
compression enabled. Disabling fast clears would be enough for most aux
usages, but replacement is also done to handle ISL_AUX_USAGE_FCV_CCS_E.
Note that this also fixes a bug. Format reinterpretation with
incompatible clear colors previously was not handled for the MCS aux
usages.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23676>
iris_resource_prepare_render will soon gain the ability to change a
resource's clear color. iris_blorp_surf_for_resource will keep a copy of
that clear color, so make sure calls to it happen after the render
preparation helper. At the moment, this shouldn't have an impact besides
improving debugging.
While we're here, do the same for the generic access preparation helper.
We may convert those to more specific helpers at a later time.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23676>
ISL's state-machine of CCS_D describes full resolves as leaving the aux
buffer in the pass-through state. Hardware doesn't behave this way on
gfx8 however. On that platform, full resolves transition the aux buffer
to the resolved state. This was verified by dumping the CCS before and
after a full resolve on BDW (gfx7 is simply assumed to behave the same).
Ambiguate after resolving to match driver expectations.
Prevents iris from failing piglit's fcc-write-after-clear on BDW with a
future patch which relies on fast-clear encodings being removed after a
resolve. The avoided failure is:
Testing implicit read of partial block UNORM -> SNORM
Probe color at (0,1,0)
Expected: 1.000000 1.000000 1.000000 1.000000
Observed: 0.000000 0.000000 0.000000 0.000000
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23676>
The field is no longer consumed by brw_complie_* and is instead handled
directly by the crocus driver. Therefore, it's safe to leave it zero
and not even bother setting it. This removes our reliance on the
SWIZZLE_* macros in prog_instructions.h.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24288>
The initial plan was to use 'nir_legacy' helpers but it turns out
that our RA pass is hard to confince to be happy with it. So we are
useing the 'chasing' helpers now.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24216>
nir's source modifiers are going away soon and with it also the lowering
pass. Lets switch to our own lowering pass. We need to run our own
lowering pass almost at the end else opc_cse(..) etc. might do some
wrong needed opts as nir does not see our modifiers.
Also we need to remove the last nir_opt_dce(..) as it will remove not dead
code caused by the used load_const hack.
32 %15 = load_const (0x00000000 = 0.000000)
32 %4 = fabs %15 (0.000000)
nir_opt_dce is correct when it removes the two instructions. But in reality
the load_const is a uniform that should not be removed.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24216>
This is more or less a copy of nir_lower_to_source_mods(..) with
the following differences:
- we store the source mods in pass_flags
- we do not try to saturate the destination
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24216>
When we switch to our own lower_to_source_mods pass we will start
to see such patterns:
32x4 %18 = fneg %5 (-5.125000, -30.000000, 5.500000, -6.500000)
32x4 %19 = ffma %18, %8, %4 (-6.500000, -7.750000, 6.500000, 6.000000)
This is a problem as we will generate instruction that accesses two
different uniforms, which is a problem on GPUs where has_no_oneconst_limit
is false.
Make lower_alu(..) smarter by looking in the parent for for the constant
value.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24216>
We only need to clear the 'dead' bits. The others are
used for source mods.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24216>
As nir_lower_to_source_mods(..) will be deleted and with it the modifier storage
in nir's core we need to find an other way store the information.
We have have 6 bits left in nir's pass_flags - so lets go that route.
This also adds some small helpers that will be used later.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24216>
As we are going to extend the enum etna_pass_flags it makes sense
to add a small helper to test if an instruction is dead. An instruction
is dead if BYPASS_DST or BYPASS_SRC is set.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24216>
This enum is used for the pass_flags that can be set on a
nir_instr. Name it to make the intention of its usage clear.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24216>
It helps to make the code easier to read.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24216>
The second operand is 16bit, so the we need to use VOP3 to address v128-v255.
Closes: #9413
Fixes: 6872f8d861 ("aco/gfx11: allow true 16-bit instructions to access v128+")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24294>
In the case of:
halt
// succs: b9
if %618 {
block b3:// preds:
break
// succs: b6
} else {
block b4: // preds: , succs: b5
}
block b5: // preds: b4
32 %556 = iadd %617, %2 (0x1)
opt_constant_if() doesn't work because stitch_blocks() can't join blocks if the
before ends in a jump and the after isn't empty.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24235>
Same cause as for other R8G8 formats - msaa resolve via
blit event causes gpu fault.
Fixes:
dEQP-VK.api.image_clearing.*.clear_color_attachment.*.r8g8_srgb_*
Fixes: 029919f3c8
("tu: allow using resolve engine for SRGB MSAA resolves")
Cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24277>
Both per-primitive and per-vertex space is allocated in MUE in 8 dword
chunks and those 8-dword chunks (granularity of
3DSTATE_SBE_MESH.Per[Primitive|Vertex]URBEntryOutputReadLength)
are passed to fragment shaders as inputs (either non-interpolated
for per-primitive and flat vertex attributes or interpolated
for non-flat vertex attributes).
Some attributes have a special meaning and must be placed in separate
8/16-dword slot called Primitive Header or Vertex Header.
Primitive Header contains 4 such attributes (Cull Primitive,
ViewportIndex, RTAIndex, CPS), leaving 4 dwords (the rest of 8-dword
slot) potentially unused.
Vertex Header is similar - it starts with 3 unused dwords, 1 dword for
Point Size (but if we declare that shader doesn't produce Point Size
then we can reuse it), followed by 4 dwords for Position and optionally
8 dwords for clip distances.
This means we have an interesting optimization problem - we can put
some user attributes into holes in Primitive and Vertex Headers, which
may lead to smaller MUE size and potentially more mesh threads running
in parallel, but we have to be careful to use those holes only when
we need it, otherwise we could force HW to pass too much data to
fragment shader.
Example 1:
Let's assume that Primitive Header is enabled and user defined
12 dwords of per-primitive attributes.
Without packing we would consume 8 + ALIGN(12, 8) = 24 dwords of
MUE space and pass ALIGN(12, 8) = 16 dwords to fragment shader.
With packing, we'll consume 4 + 4 + ALIGN(12 - 4, 8) = 16 dwords of
MUE space and pass ALIGN(4, 8) + ALIGN(12 - 4, 8) = 16 dwords to
fragment shader.
16/16 is better than 24/16, so packing makes sense.
Example 2:
Now let's assume that Primitive Header is enabled and user defined
16 dwords of per-primitive attributes.
Without packing we would consume 8 + ALIGN(16, 8) = 24 dwords of
MUE space and pass ALIGN(16, 16) = 16 dwords to fragment shader.
With packing, we'll consume 4 + 4 + ALIGN(16 - 4, 8) = 24 dwords of
MUE space and pass ALIGN(4, 8) + ALIGN(16 - 4, 8) = 24 dwords to
fragment shader.
24/24 is worse than 24/16, so packing doesn't make sense.
This change doesn't affect vk_meshlet_cadscene in default configuration,
but it speeds it up by up to 25% with "-extraattributes N", where
N is some small value divisible by 2 (by default N == 1) and we
are bound by URB size.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>
Instead of using 4 dwords for each output slot, use only the amount
of memory actually needed by each variable.
There are some complications from this "obvious" idea:
- flat and non-flat variables can't be merged into the same vec4 slot,
because flat inputs mask has vec4 stride
- multi-slot variables can have different layout:
float[N] requires N 1-dword slots, but
i64vec3 requires 1 fully occupied 4-dword slot followed by 2-dword slot
- some output variables occur both in single-channel/component split
and combined variants
- crossing vec4 boundary requires generating more writes, so avoiding them
if possible is beneficial
This patch fixes some issues with arrays in per-vertex and per-primitive data
(func.mesh.ext.outputs.*.indirect_array.q0 in crucible)
and by reduction in single MUE size it allows spawning more threads at
the same time.
Note: this patch doesn't improve vk_meshlet_cadscene performance because
default layout is already optimal enough.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>