Implement parsing and encoding of special floating-point values for
both 20-bit (f20) and 16-bit (f16) immediate formats:
inf:f20 - Positive infinity (imm_val=0x7f800, imm_type=0)
-inf:f20 - Negative infinity (imm_val=0xff800, imm_type=0)
nan:f20 - Quiet NaN (imm_val=0x7fc00, imm_type=0)
-nan:f20 - Negative NaN (imm_val=0xffc00, imm_type=0)
inf:f16 - Positive infinity (imm_val=0x7c00, imm_type=3)
-inf:f16 - Negative infinity (imm_val=0xfc00, imm_type=3)
nan:f16 - Quiet NaN (imm_val=0x7fff, imm_type=3)
-nan:f16 - Negative NaN (imm_val=0xffff, imm_type=3)
The f20 format stores the upper 20 bits of an IEEE 754 single-precision
float. The f16 format stores the 16-bit half-float value directly.
This enables round-trip assembly of shaders containing these special
values, which can appear in GPU command streams captured from the
proprietary driver.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39016>
The previous code incorrectly treated f16 immediates as truncated f32
values (bits >> 12). The f16 immediate format (imm_type=3) expects a
16-bit IEEE 754 half-precision float, not the upper 20 bits of an f32.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39016>
The dual16 mode is now fully encoded in the disassembly output through
type suffixes (:f16 vs :f20) and register naming (th vs t), making the
explicit dual16 mode parameter redundant.
Previously, the parser needed a dual16_mode flag to determine whether
float immediates should use imm_type=0 (f32) or imm_type=3 (f16). Now
that the disassembler emits explicit :f16/:f20 suffixes, the parser can
determine the correct encoding directly from the input text.
This simplifies the API by removing the dual16_mode parameter from:
- isa_parse_str()
- isa_parse_file()
- Internal parsing functions
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39016>
The assembler and disassembler now use explicit type suffixes for all
immediate values to ensure correct round-trip encoding:
- :f20 - 20-bit float (upper 20 bits of IEEE 754 single precision)
- :f16 - 16-bit half-float
- :s20 - 20-bit signed integer (two's complement)
- :u20 - 20-bit unsigned integer
Previously, the parser did not distinguish between signed and unsigned
integers, causing incorrect encoding. The signed format uses 20-bit
two's complement where bit 19 is the sign bit and maps to AMODE[0] in
the instruction encoding.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39016>
When a parsing test fails, output the actual error message to aid
debugging. This makes it immediately clear why parsing failed instead
of just showing that the test didn't succeed.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39016>
To not print a warning about missing SPM by default on < GFX10.
Also move the function to radv_physical_device.c and make it non-static.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39065>
Currently the sm120 instruction latency code expects registers to be out
of SSA. This prerequisite is broken with the prepass scheduler.
This commit removes non-SSA-specific code.
Fixes: b55b8da012 ("nak: Add a prepass instruction scheduler")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Signed-off-by: Lorenzo Rossi <git@rossilorenzo.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39072>
This helper was introduced in b4bac84d3b ("nak: Add a Dst::file()
helper function") which missed updating the sm120 file.
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39074>
Initialize gang CS on unsupported transfer operations.
Add a wait when:
- SDMA needs to wait for previous transfer operations on ACE
- ACE needs to wait for previous transfer operations on SDMA
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
They will be called from the transfer copy functions.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
We need to use gang semaphores in the following two scenarios:
1. Leader to follower semaphore:
Increment the leader to follower semaphore when the leader wants
to block the follower: a transfer operation on ACE needs to wait
for a previous operation on SDMA.
2. Follower to leader semaphore:
Increment the follower to leader semaphore when the follower wants
to block the leader: a transfer operation on SDMA needs to wait
for a previous operation on ACE.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
Change the explanation to use "leader" and "follower" terminology.
Explain better how it is used with GFX/ACE and SDMA/ACE.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
RADV's transfer queue implementation will use compute for
the transfer operations that aren't supported by the SDMA,
so we'll need gang submissions for that.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
The following are not supported by SDMA:
- Sparse images (aka. PRT) on older GPUs
- Multisampled images
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
When the "gang leader" is SDMA, we need to ensure that the
gang semaphores BO is coherent between SDMA and CP.
To achieve this, we need bypass the L2 cache when either SDMA
or CP are connected to L2.
Suggested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
Since we use atomic mode setting now, the wsi->fd we use needs to have
the atomic client cap.
There are several different code paths where wsi can acquire a file
descriptor. For drm masters, the atomic client cap is set in
wsi_display_init_wsi. For leased drm fds, there are AcquireDrmDisplayEXT
and AcquireXlibDisplayEXT.
According to a comment we previously assumed wsi_display_get_connector
is common among all code paths, and that's why the atomic client cap was
set there. But that assumption can be broken based on the particular
order which the application invokes vulkan APIs in.
This commit simply push the drmSetClientCap to all entrypoints where a
drm fd comes through.
Fixes: 513ffea1d3 ("wsi/display: use atomic mode setting")
Signed-off-by: Yuxuan Shui <yshui@codeweavers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38987>
Replace draw_entry_bytes with AC_TASK_DRAW_ENTRY_BYTES.
This is 16 on all AMD HW that supports task/mesh shaders.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39032>
The pass now uses the ring descriptors to figure these out.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39032>
Currently the number of task payload entry size is hardcoded
in shaders as a constant. This isn't a good idea because it
makes the code inflexible, eg. doesn't allow us
to change the number of entries dynamically.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39032>
Currently the number of task shader ring entries is hardcoded
in shaders as a constant. This isn't a good idea because it
makes the code inflexible, eg. prevents us from using the same
shader binary accross some chips as well as doesn't allow us
to change the number of entries dynamically.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39032>
Move the update of pan_shader_info into its own function. For now this
does nothing, but it will allow us to update the info before printing
stats, which will be useful in the future.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38961>
This isn't very useful yet (the info isn't fully filled
out at the point of the call) so the printed values should
be viewed skeptically.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38961>
We need a subset of the pan_shader_info struct for each variant,
and previously we were building a struct containing that subset and
passing the struct to bi_compile_variant_nir. Instead we should pass
a pointer to the whole struct and let bi_compile_variant_nir build the
substructure. This is both more efficient, and also gives the stats
code access to the full information.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38961>
Add the actual registers used (including uniforms used) by the shader
to stats. This is calculated by the stats gathering code, because the
scheduler and scoreboard passes run after register allocation and can
sometimes change the results.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38961>
These aren't new in RGP 2.6, they have been added since a while. But
because RADV wasn't supporting the new derived SPM chunk it wasn't
possible to expose them.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39013>
This is the new method to add performance counters to RGP captures.
This will be used to add the new RGP 2.6 counters too.
The previous SPM code will be deprecated at some point but it's hard
to support all generations in one batch. So, I will implement this
step by step.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39013>
Some blocks have two or more SPM counters and they should be used when
more than 4 counters are programmed (ie. 16-bit per counter).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39013>