We don't usually document the details about the various fields here.
Let's drop these, as the comments don't exits in future docs either,
making it a bit easier to diff them.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
The distinction between 8 and 4 bit here is kinda meaningless, because
the extra bits are zero regardless. This matches what the spec says, and
is also consistent with the other XML definitions we have. So let's make
it consistent so we can more easily diff the XML files to see what
changed.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
Most of the XML files does this, so let's do the same here. This
shouldn't matter in practice, as we always set the field anyway.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
This makes things more consistent, but shouldn't make a practical
difference apart from reducing needless diffs.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
We don't specify these for V10, and the default is the zero-value
anyway. Let's drop them to simplify things and reduce needless diffs.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
We do this for all masks from V12 and later, but not always in earlier
gens. Let's fix this up to both produce cleaner dumps, as well as reduce
needless diffs between the files.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
This is what cs_sync32_add etc expects. Not sure why this isn't
producing compile-time errors, but we should be consistent here anyway.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
This is what we're doing for later gens, so let's be consistent. While
we're at it, fix up the casing of the names as well.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
"Scissor mode" is the name that the HW spec gives to this bit, but it's
more accurately described as "Scissor to bounding box", which is what we
use for V9 V10. It seems that name was picked intentionally over what the
HW spec calls it.
We should do the same for later gens as well, as this keeps the code
simpler.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
The field here is 8 bit wide according to the spec. However, because all
values that takes more than two bits are reserved, this doesn't actually
lead to any misbehavior. But let's make this consistent with the spec
and newer XML files.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
This field is specified to be 4 bit wide, not 3. This doesn't make a
practical difference, because all values with the top bit set are
undefined, so it will always be zero. But we should get it right to
reduce needless diffs here.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
Most of the work to support predicating draws is already done, mainly
just needed to support predicating dispatches and wire it up.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41839>
The scheduler expects that dest values that are marked as pin_group
are used as src values in some instruction that takes a vec4 as source,
otherwise the free channels in the vec4 group are not evaluated correctly.
Fix the extra instructions when lowering buf_txf to backend IR to use free
ALU dest registers.
Fixes: 13b1069a87 ("r600/sfn: Handle pre-EG buffer fetch")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15433
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41835>
If we already have 2.0 do not add separate -2.0. The negate is for free
if it is only used as a scalar. We could also do this for constant
vectors, but only for vs, since for fs, we can have only per source
negate, not per channel, so keep it simple for now.
Shader-db RV410:
total consts in shared programs: 86348 -> 86272 (-0.09%)
consts in affected programs: 6036 -> 5960 (-1.26%)
helped: 76
HURT: 0
total cycles in shared programs: 175335 -> 175332 (<.01%)
cycles in affected programs: 10868 -> 10865 (-0.03%)
helped: 56
HURT: 30
total temps in shared programs: 19487 -> 19510 (0.12%)
temps in affected programs: 362 -> 385 (6.35%)
helped: 9
HURT: 8
total instructions in shared programs: 118451 -> 118461 (<.01%)
instructions in affected programs: 8105 -> 8115 (0.12%)
helped: 51
HURT: 29
LOST: 3
GAINED: 8
Most notably we again compile all glamor shaders, gain 2 tropics ones
and trade 3 lost for 2 gained in gsk, which doesn't matter much since it
will fallback to software after first linking failure anyway.
RV530:
total cycles in shared programs: 191425 -> 191385 (-0.02%)
cycles in affected programs: 6249 -> 6209 (-0.64%)
helped: 45
HURT: 12
total consts in shared programs: 94030 -> 93967 (-0.07%)
consts in affected programs: 5803 -> 5740 (-1.09%)
helped: 63
HURT: 0
total temps in shared programs: 17037 -> 17040 (0.02%)
temps in affected programs: 49 -> 52 (6.12%)
helped: 1
HURT: 3
total instructions in shared programs: 128823 -> 128789 (-0.03%)
instructions in affected programs: 6164 -> 6130 (-0.55%)
helped: 49
HURT: 19
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41618>
After removing the TGSI layer, load_const values will be emitted directly
as RC immediates without the scalar packing that tgsi_ureg used to do.
This can push fragment shaders past the 32-slot hardware limit on R3xx/R4xx.
Swap dead_constants and dataflow_swizzles pass order so constant
compaction runs before swizzle legalization, giving the legalization
pass an accurate slot count to work with.
In rc_remove_unused_constants, when the slot budget is tight on
R3xx/R4xx, enable aggressive packing for vec-used immediates.
Deduplicate repeated values within an immediate and merge subsequent vec
immediates into existing slots by matching values and filling free
channels.
Very small win on R5xx and very small hit on R3xx/R4xx (due to smaller
amount of legal swizzles).
Shader-db RV530:
total cycles in shared programs: 191452 -> 191425 (-0.01%)
cycles in affected programs: 5168 -> 5141 (-0.52%)
helped: 24
HURT: 10
total temps in shared programs: 17046 -> 17037 (-0.05%)
temps in affected programs: 201 -> 192 (-4.48%)
helped: 11
HURT: 5
total consts in shared programs: 94033 -> 94030 (<.01%)
consts in affected programs: 277 -> 274 (-1.08%)
helped: 5
HURT: 5
total instructions in shared programs: 128840 -> 128823 (-0.01%)
instructions in affected programs: 3588 -> 3571 (-0.47%)
helped: 25
HURT: 12
RV410:
total cycles in shared programs: 176230 -> 176270 (0.02%)
cycles in affected programs: 20598 -> 20638 (0.19%)
helped: 51
HURT: 66
total temps in shared programs: 19655 -> 19650 (-0.03%)
temps in affected programs: 1310 -> 1305 (-0.38%)
helped: 37
HURT: 25
total instructions in shared programs: 119346 -> 119379 (0.03%)
instructions in affected programs: 13884 -> 13917 (0.24%)
helped: 58
HURT: 65
total consts in shared programs: 86146 -> 86412 (0.31%)
consts in affected programs: 3093 -> 3359 (8.60%)
helped: 8
HURT: 182
Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41618>
This uses global_atomic_ordered_add_b64 to implement Ordered Append
and experimentally measure its memory throughput.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41452>
The xe.ko kernel driver sets the ROW_CHICKEN bit to disable Early EOT
on all revisions of Xe2, I believe as far back as 6.10. Although Xe2
doesn't have the variable registers per thread feature of Xe3, it still
has Large GRF mode that can be switched on and off, and there are issues
with combining the two features. Plus, apparently this wasn't observed
to help much with performance.
That means that EOT sends are no longer special, and we don't need to
restrict ourselves to r112-r127. Relax the validator so Jay can use this.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Co-authored-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41831>
The existing util_get_narrow_range_coeffs doesn't work for RGB, since
all channels in RGB will share the same scale and bias.
Signed-off-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41787>
Moves dispatch of multiple draws into `kk_draw`. This allows for any draw
pre-processing to operate on the full set of draws at once, reducing dispatch
calls and maximizing parallel work.
Draw data may also specify predicates that need to be applied to the draws.
This along with batched draw processing will be useful for implementing
features like conditional rendering later.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41799>
Set this to false for non-video queues like the Nvidia driver.
This prevents getting debug warns that
VK_STRUCTURE_TYPE_QUEUE_FAMILY_QUERY_RESULT_STATUS_PROPERTIES_KHR
is not handled when we enable KHR_video_queue.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41752>