Instead of going through the ureg / TGSI tokens / r300_tgsi_to_rc
parsing round-trip, walk the ntr_insn list and rc_constants_add /
rc_insert_new_instruction the result straight into the
radeon_compiler that the caller passes in. Translation reuses the
rc_translate_* helpers extracted in the previous commit.
Changes touching the surrounding code:
- nir_to_rc returns void and takes a struct radeon_compiler *.
- ntr_compile tracks immediates and UBO size in its own
util_dynarray instead of relying on ureg_DECL_immediate /
ureg_DECL_constant2D's bookkeeping.
- ntr_output_decl tracks the FS output color/depth indices so
nir_to_rc can populate compiler->OutputColor[] /
compiler->OutputDepth at the end - find_output_registers is gone.
- r300_translate_{fragment,vertex}_shader drop the tgsi_scan_shader
+ r300_tgsi_to_rc + ttr.error dance and switch to checking
compiler.Base.Error.
- write_all (gl_FragColor vs gl_FragData[0]) now comes from a NIR
walk in r300_translate_fragment_shader rather than reading the
TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS property.
- r300_tgsi_to_rc.{c,h} are deleted, meson.build updated, and the
obsolete header includes go away in r300_fs.c / r300_vs.c.
Assisted-by: Codex (GPT-5.5)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41577>
Pull translate_opcode, translate_register_file, translate_saturate
and the texture-target switch out of r300_tgsi_to_rc into
nir_to_rc.h as static inline rc_translate_* helpers. r300_tgsi_to_rc
now uses them and this is preparation for direct RC emission from
nir_to_rc.c
Assisted-by: Codex (GPT-5.5)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41577>
This was moved from r300_optimize_nir previously because we that was
called in finalize_nir and thus could be called more than once. This is
not the case anymore. Also drop the stale nine optimization.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41577>
Stop relying on ureg_DECL_immediate's "expand" path to fill earlier
TGSI immediates' unused components with values from later load_const
instructions and depend on later backed pass to do it.
Mostly a wash on shader-db: sub-0.1% regressions on inst/cycles/
consts on RV530/RV370/RV410, with one LOST shader on RV370
(trine/fp-17.shader_test FS).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41577>
Drop the TGSI ureg roundtrip in r300_dummy_fragment_shader and
construct the (0, 0, 0, 1) FS straight via nir_builder, matching
the rest of the compile pipeline that already runs on NIR.
Assisted-by: Codex (GPT-5.5)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41577>
The NIR IO intrinsics already carry the locations and register bases
used for the generated declarations, so fill r300_shader_semantics while
emitting the NIR loads and stores.
This removes the FS input semantic scan and lets the VS output setup use
the same NIR-derived information. Track the total number of used
inputs/outputs as well.
Also stop depending on tgsi_info for the external constants.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41577>
Always convert TGSI shaders to NIR up-front in r300_create_{fs,vs}_state
so the rest of the compile pipeline only ever has to deal with NIR. The
TGSI->RC translation in r300_translate_{fragment,vertex}_shader now
always goes through nir_to_rc.
This requires shifting r300_blitter_draw_rectangle's sprite_coord_enable
bit from 0 to 9. The blitter's GENERIC[0] FS input now lands at
fs_inputs->generic[9] after the +9 shift in ntr_fixup_varying_slots, so
the rasterizer's sprite_coord_enable check needs the matching bit.
The draw path still needs TGSI, so we convert it back explicitly for
now. The deTGSIzation of draw paths will come later.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41577>
We don't usually document the details about the various fields here.
Let's drop these, as the comments don't exits in future docs either,
making it a bit easier to diff them.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
The distinction between 8 and 4 bit here is kinda meaningless, because
the extra bits are zero regardless. This matches what the spec says, and
is also consistent with the other XML definitions we have. So let's make
it consistent so we can more easily diff the XML files to see what
changed.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
Most of the XML files does this, so let's do the same here. This
shouldn't matter in practice, as we always set the field anyway.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
This makes things more consistent, but shouldn't make a practical
difference apart from reducing needless diffs.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
We don't specify these for V10, and the default is the zero-value
anyway. Let's drop them to simplify things and reduce needless diffs.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
We do this for all masks from V12 and later, but not always in earlier
gens. Let's fix this up to both produce cleaner dumps, as well as reduce
needless diffs between the files.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
This is what cs_sync32_add etc expects. Not sure why this isn't
producing compile-time errors, but we should be consistent here anyway.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
This is what we're doing for later gens, so let's be consistent. While
we're at it, fix up the casing of the names as well.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
"Scissor mode" is the name that the HW spec gives to this bit, but it's
more accurately described as "Scissor to bounding box", which is what we
use for V9 V10. It seems that name was picked intentionally over what the
HW spec calls it.
We should do the same for later gens as well, as this keeps the code
simpler.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
The field here is 8 bit wide according to the spec. However, because all
values that takes more than two bits are reserved, this doesn't actually
lead to any misbehavior. But let's make this consistent with the spec
and newer XML files.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
This field is specified to be 4 bit wide, not 3. This doesn't make a
practical difference, because all values with the top bit set are
undefined, so it will always be zero. But we should get it right to
reduce needless diffs here.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41721>
Most of the work to support predicating draws is already done, mainly
just needed to support predicating dispatches and wire it up.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41839>
The scheduler expects that dest values that are marked as pin_group
are used as src values in some instruction that takes a vec4 as source,
otherwise the free channels in the vec4 group are not evaluated correctly.
Fix the extra instructions when lowering buf_txf to backend IR to use free
ALU dest registers.
Fixes: 13b1069a87 ("r600/sfn: Handle pre-EG buffer fetch")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15433
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41835>
If we already have 2.0 do not add separate -2.0. The negate is for free
if it is only used as a scalar. We could also do this for constant
vectors, but only for vs, since for fs, we can have only per source
negate, not per channel, so keep it simple for now.
Shader-db RV410:
total consts in shared programs: 86348 -> 86272 (-0.09%)
consts in affected programs: 6036 -> 5960 (-1.26%)
helped: 76
HURT: 0
total cycles in shared programs: 175335 -> 175332 (<.01%)
cycles in affected programs: 10868 -> 10865 (-0.03%)
helped: 56
HURT: 30
total temps in shared programs: 19487 -> 19510 (0.12%)
temps in affected programs: 362 -> 385 (6.35%)
helped: 9
HURT: 8
total instructions in shared programs: 118451 -> 118461 (<.01%)
instructions in affected programs: 8105 -> 8115 (0.12%)
helped: 51
HURT: 29
LOST: 3
GAINED: 8
Most notably we again compile all glamor shaders, gain 2 tropics ones
and trade 3 lost for 2 gained in gsk, which doesn't matter much since it
will fallback to software after first linking failure anyway.
RV530:
total cycles in shared programs: 191425 -> 191385 (-0.02%)
cycles in affected programs: 6249 -> 6209 (-0.64%)
helped: 45
HURT: 12
total consts in shared programs: 94030 -> 93967 (-0.07%)
consts in affected programs: 5803 -> 5740 (-1.09%)
helped: 63
HURT: 0
total temps in shared programs: 17037 -> 17040 (0.02%)
temps in affected programs: 49 -> 52 (6.12%)
helped: 1
HURT: 3
total instructions in shared programs: 128823 -> 128789 (-0.03%)
instructions in affected programs: 6164 -> 6130 (-0.55%)
helped: 49
HURT: 19
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41618>
After removing the TGSI layer, load_const values will be emitted directly
as RC immediates without the scalar packing that tgsi_ureg used to do.
This can push fragment shaders past the 32-slot hardware limit on R3xx/R4xx.
Swap dead_constants and dataflow_swizzles pass order so constant
compaction runs before swizzle legalization, giving the legalization
pass an accurate slot count to work with.
In rc_remove_unused_constants, when the slot budget is tight on
R3xx/R4xx, enable aggressive packing for vec-used immediates.
Deduplicate repeated values within an immediate and merge subsequent vec
immediates into existing slots by matching values and filling free
channels.
Very small win on R5xx and very small hit on R3xx/R4xx (due to smaller
amount of legal swizzles).
Shader-db RV530:
total cycles in shared programs: 191452 -> 191425 (-0.01%)
cycles in affected programs: 5168 -> 5141 (-0.52%)
helped: 24
HURT: 10
total temps in shared programs: 17046 -> 17037 (-0.05%)
temps in affected programs: 201 -> 192 (-4.48%)
helped: 11
HURT: 5
total consts in shared programs: 94033 -> 94030 (<.01%)
consts in affected programs: 277 -> 274 (-1.08%)
helped: 5
HURT: 5
total instructions in shared programs: 128840 -> 128823 (-0.01%)
instructions in affected programs: 3588 -> 3571 (-0.47%)
helped: 25
HURT: 12
RV410:
total cycles in shared programs: 176230 -> 176270 (0.02%)
cycles in affected programs: 20598 -> 20638 (0.19%)
helped: 51
HURT: 66
total temps in shared programs: 19655 -> 19650 (-0.03%)
temps in affected programs: 1310 -> 1305 (-0.38%)
helped: 37
HURT: 25
total instructions in shared programs: 119346 -> 119379 (0.03%)
instructions in affected programs: 13884 -> 13917 (0.24%)
helped: 58
HURT: 65
total consts in shared programs: 86146 -> 86412 (0.31%)
consts in affected programs: 3093 -> 3359 (8.60%)
helped: 8
HURT: 182
Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41618>
This uses global_atomic_ordered_add_b64 to implement Ordered Append
and experimentally measure its memory throughput.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41452>
The xe.ko kernel driver sets the ROW_CHICKEN bit to disable Early EOT
on all revisions of Xe2, I believe as far back as 6.10. Although Xe2
doesn't have the variable registers per thread feature of Xe3, it still
has Large GRF mode that can be switched on and off, and there are issues
with combining the two features. Plus, apparently this wasn't observed
to help much with performance.
That means that EOT sends are no longer special, and we don't need to
restrict ourselves to r112-r127. Relax the validator so Jay can use this.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Co-authored-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41831>
The existing util_get_narrow_range_coeffs doesn't work for RGB, since
all channels in RGB will share the same scale and bias.
Signed-off-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41787>