Commit graph

886 commits

Author SHA1 Message Date
Juan A. Suarez Romero
7dc6b8df11 broadcom/compiler: use unsigned types when performing bitshifting
Ensure unsigned integers are used instead of signed ones when performing
left bit shifts.

This has been detected by the Undefined Behaviour Sanitizer (UBSan).

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29911>
2024-07-01 08:02:07 +00:00
David Heidelberg
68215332a8 build: pass licensing information in SPDX form
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric@igalia.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29972>
2024-06-29 12:42:49 -07:00
Iago Toral Quiroga
4e6b675974 broadcom/compiler: drop multop if we dce umul24
We always emit multop+umul24 to implement integer multiply and
this is the only scenario in which we use multop, so if we decide
to DCE umul24 we should also DCE the previous multop.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29909>
2024-06-27 06:43:09 +00:00
Iago Toral Quiroga
0a7a36372f broadcom/compiler: validate rtop + thrsw hazard
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29909>
2024-06-27 06:43:09 +00:00
Iago Toral Quiroga
d1f8351f3c broadcom/compiler: fix per-quad spilling
This is not safe when we have conditional spills since we could be
spilling disabled lanes with undefined values that could overwrite
valid data for those lanes from a previous spill of the same temp
that was unconditional (or that condionally enabled those same
lanes).

Fixes some Piglit OpenCL tests as well as the following OpenCL tests:
integer_divideAssign
integer_moduloAssign
integer_mad_sat
integer_ops integer_divideAssign
integer_ops integer_mad_sat
integer_ops integer_moduloAssign
integer_ops quick_char_math
integer_ops quick_short_math
math_brute_force half_powr
math_brute_force pow
math_brute_force pown
math_brute_force powr
math_brute_force rootn

Fixes: 597560e27c ('broadcom/compiler: always enable per-quad on spill operations')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29909>
2024-06-27 06:43:09 +00:00
Iago Toral Quiroga
38b7f411a1 broadcom/compiler: don't spill in between multop and umul24
The multop instruction implicitly writes rtop which is not preserved
acrosss thread switches. We can spill the sources of the multop
(since these would happen before multop) and the destination of
umul24 (since that would happen after umul24).

Fixes some OpenCL tests when V3D_DEBUG=opt_compile_time is used to
choose a different compile configuration.

cc: mesa-stable

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29909>
2024-06-27 06:43:09 +00:00
Karol Herbst
742984a325 broadcom/compiler: handle variable shared memory
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25362>
2024-06-26 10:04:03 +00:00
Karol Herbst
9bf0b3a112 broadcom/compiler: call nir_lower_64bit_phis
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25362>
2024-06-26 10:04:03 +00:00
Karol Herbst
4a169a518e broadcom/compiler: implement load_kernel_input
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25362>
2024-06-26 10:04:03 +00:00
Karol Herbst
caa3872f76 broadcom/compiler: abort on unknown intrinsics
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25362>
2024-06-26 10:04:03 +00:00
Karol Herbst
f8ab9c0e93 broadcom/compiler: handle up to vec16 load_uniforms
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25362>
2024-06-26 10:04:03 +00:00
Karol Herbst
e050b13777 broadcom/compiler: try handling 8/16 bit alu operations
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25362>
2024-06-26 10:04:02 +00:00
Karol Herbst
c7f9cca985 broadcom/compiler: fix iu2f32 for 8 and 16 bit inputs
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25362>
2024-06-26 10:04:02 +00:00
Karol Herbst
214121e9b0 broadcom/compiler: handle fp16 conversion ops
As long as fp16 isn't advertized it's not doing much, but it also doesn't
hurt to add them.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25362>
2024-06-26 10:04:02 +00:00
Karol Herbst
c2ec65eeda broadcom/compiler: add generated v3d_nir_lower_algebraic
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25362>
2024-06-26 10:04:02 +00:00
Alyssa Rosenzweig
da752ed7c1 treewide: use nir_def_replace sometimes
Two Coccinelle patches here. Didn't catch nearly as much as I would've liked but
it's a start.

Coccinelle patch:

    @@
    expression intr, repl;
    @@

    -nir_def_rewrite_uses(&intr->def, repl);
    -nir_instr_remove(&intr->instr);
    +nir_def_replace(&intr->def, repl);

Coccinelle patch:

    @@
    identifier intr;
    expression instr, repl;
    @@

    nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);
    ...
    -nir_def_rewrite_uses(&intr->def, repl);
    -nir_instr_remove(instr);
    +nir_def_replace(&intr->def, repl);

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com> [broadcom]
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> [lima]
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> [etna]
Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com> [r300]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29817>
2024-06-21 15:36:56 +00:00
Iago Toral Quiroga
02f33b7d92 broadcom/compiler: initialize payload_conflict for all initial nodes
Fixes: cb83f25b39 ('broadcom/compiler: don't assign payload registers to spilling setup temps')
cc: mesa-stable

Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29759>
2024-06-18 07:19:07 +00:00
Alyssa Rosenzweig
15257b65c6 treewide: use nir_metadata_control_flow
Via Coccinelle patch:

    @@
    @@

    -nir_metadata_block_index | nir_metadata_dominance
    +nir_metadata_control_flow

...plus some manual fixups for call sites missed by coccinelle.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Juan A. Suarez Romero <jasuarez@igalia.com> [broadcom]
Acked-by: Vasily Khoruzhick <anarsoul@gmail.com> [lima]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29745>
2024-06-17 16:28:14 -04:00
Daniel Schürmann
7af16e9f1e nir/shader_info: remove uses_demote
This flag is mostly redundant with uses_discard and was only
introduced to implement demote with LLVM when it didn't have
that intrinsic.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27617>
2024-06-17 19:37:16 +00:00
Daniel Schürmann
9b1a748b5e nir: remove nir_intrinsic_discard
The semantics of discard differ between GLSL and HLSL and
their various implementations. Subsequently, numerous application
bugs occurred and SPV_EXT_demote_to_helper_invocation was written
in order to clarify the behavior. In NIR, we now have 3 different
intrinsics for 2 things, and while demote and terminate have clear
semantics, discard still doesn't and can mean either of the two.

This patch entirely removes nir_intrinsic_discard and
nir_intrinsic_discard_if and replaces all occurences either with
nir_intrinsic_terminate{_if} or nir_intrinsic_demote{_if} in the
case that the NIR option 'discard_is_demote' is being set.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27617>
2024-06-17 19:37:16 +00:00
Karol Herbst
05b9705ae0 broadcom/compiler: rework scratch lowering
Let's rely on nir_lower_mem_access_bit_sizes doing all the heavy work, so
v3d_nir_lower_scratch can be cleaned up quite a lot.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29711>
2024-06-17 10:07:56 +00:00
Karol Herbst
75196e86f1 broadcom/compiler: only handle load_uniform explicitly in v3d_nir_lower_load_store_bitsize
Also use nir_get_io_offset_src_number while at it.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29711>
2024-06-17 10:07:56 +00:00
Karol Herbst
a2eff2b9f9 broadcom/compiler: convert 2x32 global operations to scalar variants
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29711>
2024-06-17 10:07:56 +00:00
Karol Herbst
9827cfe49e broadcom/compiler: use nir_lower_mem_access_bit_sizes for memory lowering
It does everything we need and allows us to remove a lot of code. It also
helps with supporting vec8/16 and unaligned load/stores for OpenCL.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29711>
2024-06-17 10:07:56 +00:00
Karol Herbst
66b58e8a0e broadcom/compiler: support global load/store intrinsics
It's the same as global_2x32 as there the 2nd component is ignored anyway

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29711>
2024-06-17 10:07:56 +00:00
Karol Herbst
83883a6cc2 broadcom/compiler: handle load_workgroup_size
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29554>
2024-06-06 12:01:00 +00:00
Iago Toral Quiroga
c30833f233 broadcom/compiler: check if vertex shader writes point size
The same we already check for geometry shaders. We will use this
shortly.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29413>
2024-05-28 05:31:13 +00:00
Iago Toral Quiroga
865e682ad7 broadcom/compiler: apply payload conflict to spill setup before RA
We can emit spill setup before RA if we use scratch. In that case
we have the same situation as during spilling, with the caveat that
we have already emitted the instructions so we need to find them
(they should be the only instructions ones before the instructions
accessing payload registers) and flag them as such.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29343>
2024-05-24 05:25:22 +00:00
Iago Toral Quiroga
cb83f25b39 broadcom/compiler: don't assign payload registers to spilling setup temps
We read our payload registers first in the shader so we generally don't have
to care about temps being allocated to them and stomping their value before
we can read them. Hoewer, spilling setup instructions are an exception since
these will be inserted first when there is any spilling in the program.
To fix this, we flag RA nodes involved with these instructions so we can
then try to avoid assiging these registers to them.

Fixes CTS failures with V3D_DEBUG=opt_compile_time, particularly:
dEQP-VK.binding_model.buffer_device_address.set0.depth2.basessbo.convertcheckuv2.nostore.single.std140.comp_offset_nonzero

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29343>
2024-05-24 05:25:22 +00:00
Iago Toral Quiroga
901c485997 broadcom/compiler: make add_node return the node index
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29343>
2024-05-24 05:25:21 +00:00
Juan A. Suarez Romero
87cd11ecd2 nir,v3d: rename tlb_color_v3d intrinsic
As this is intended to be used also by VC4, change the suffix to
something more convenient, like tlb_color_brcm.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29119>
2024-05-13 10:44:17 +00:00
Juan A. Suarez Romero
920025533e broadcom/compiler: do not run lowering I/O for FS
This lowering is applied only for vertex and geometry shaders. So detect
earlier this situation and do not go ahead with other shader.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29102>
2024-05-09 07:55:00 +00:00
Iago Toral Quiroga
1545dc94b4 broadcom/compiler: simplify v3d_vir_emit_tex
In the past where backends had to deal with nir_register we needed
a specific path for them here because nir_def_components_read only
worked on ssa defs, but now that we got rid of nir_register we
only have nir_def and we don't need to go out of our way to do this
and we can just always use nir_def_components_read.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28978>
2024-05-09 09:29:44 +02:00
Iago Toral Quiroga
c24a149d2d broadcom/compiler: don't read excess channels on image loads
This is similar to what we do for textures where we program a number
of channels matching the number of componentes actually read by the
shader.

Makes tests like dEQP-VK.image.load_store.with_format.2d.r32_uint
drop from 18 instructions to 14 by emitting a single ldtmu instead
of 4.

From dEQP-VK.image.load_store.*:

total instructions in shared programs: 12681 -> 12093 (-4.64%)
instructions in affected programs: 4866 -> 4278 (-12.08%)
helped: 256
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 2.30 x̃: 2
helped stats (rel) min: 4.76% max: 25.00% x̄: 12.35% x̃: 11.11%
95% mean confidence interval for instructions value: -2.40 -2.19
95% mean confidence interval for instructions %-change: -12.99% -11.71%
Instructions are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28978>
2024-05-09 09:29:44 +02:00
Iago Toral Quiroga
ae7f20d8d4 broadcom/compiler: assert on array overflow
If a shader has too may outputs overflowing the array may
overwrite other pieces of state, which can then be tricky to
debug.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28978>
2024-05-09 09:29:44 +02:00
Juan A. Suarez Romero
9f72e22230 broadcom/compiler: remove unused parameters in vpm read
A few of the parameters are not actually used at all. So let's clean
them.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29101>
2024-05-08 14:13:07 +00:00
Marek Olšák
1632948a76 nir: validate src_type of store_output intrinsics, require bit_size >= 16
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28845>
2024-05-01 19:41:35 +00:00
Karol Herbst
d22f936019 nir: remove workgroup_id_zero_base
This removes the need for drivers to handle both versions. The base will
get added once in nir_lower_system_values when converting from deref to
intrinsic and will be replaced by a zero for users not supporting it.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800>
2024-04-24 20:18:49 +00:00
Iago Toral Quiroga
1070c9b0e7 broadcom/compiler: enable perquad with uses_wide_subgroup_intrinsics
This fixes a number of regressions in Vulkan subgroups tests in CTS.

Fixes: 97f5721bfc ('broadcom/compiler: needs_quad_helper_invocation enable PER_QUAD TMU access')
cc: mesa-stable

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28797>
2024-04-18 09:00:56 +00:00
Jose Maria Casanova Crespo
97f5721bfc broadcom/compiler: needs_quad_helper_invocation enable PER_QUAD TMU access
We take advantage of the needs_quad_helper_invocation information to
only enable the PER_QUAD TMU access on Fragment Shaders when it is needed.

PER_QUAD access is also disabled on stages different to fragment shader.
Being enabled was causing MMU errors when TMU was doing indexed by vertexid
reads on disabled lanes on vertex stage. This problem was exercised by some
shaders from the GTK new GSK_RENDERER=ngl that were accessing a constant buffer
offset[6], but having PER_QUAD enabled on the TMU access by VertexID was
doing hidden incorrect access to not existing vertex 6 and 7 as TMU was
accessing the full quad.

cc: mesa-stable

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28740>
2024-04-16 08:20:54 +00:00
Mike Blumenkrantz
7760642d2e v3d: set use_clipdist_array=true for lower_clip?
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28308>
2024-04-11 18:57:25 +00:00
Iago Toral Quiroga
9fad2922fb broadcom/compiler: fix workaround for GFXH-1602
In this scenario drivers are adding a dummy attribute with a size
of 1, so we should account for it here.

Fixes missing window decorations with GTK4+.

Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10853
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28414>
2024-04-01 09:22:39 +00:00
Yonggang Luo
1ac1c0843f treewide: Replace usage of macro DEBUG with MESA_DEBUG when possible
This is achieved by the following steps:

#ifndef DEBUG => #if !MESA_DEBUG
defined(DEBUG) => MESA_DEBUG
#ifdef DEBUG => #if MESA_DEBUG

This is done by replace in vscode

excludes
docs,*.rs,addrlib,src/imgui,*.sh,src/intel/vulkan/grl/gpu

These are safe because those files should keep DEBUG macro is already excluded;
and not directly replace DEBUG, as we have some symbols around it.

Use debug or NDEBUG instead of DEBUG in comments when proper

This for reduce the usage of DEBUG,
so it's easier migrating to MESA_DEBUG

These are found when migrating DEBUG to MESA_DEBUG,
these are all comment update, so it's safe

Replace comment /* DEBUG */ and /* !DEBUG */ with proper /* MESA_DEBUG */ or /* !MESA_DEBUG */ manually

DEBUG || !NDEBUG -> MESA_DEBUG || !NDEBUG
!DEBUG && NDEBUG -> !(MESA_DEBUG || !NDEBUG)

Replace the DEBUG present in comment with proper new MESA_DEBUG manually

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28092>
2024-03-22 18:22:34 +00:00
Juan A. Suarez Romero
69fbd5cb90 v3d: fix line coords with perspective projection
The algorithm used to rendering smooth lines worked under the assumption
that line coords were in the [0, 1] range. This was correct when using
an orthogonal projection, but not when using a perspective projection.

With a perspective projection (where the value for 1/Wc set in the VPM
is not 1.0), line coords values are also affected by this projection, so
the values are not in this range.

To deal with this, we normalize the line coords using the Wc value so
the range becomes [0, 1], and the smooth line rendering works as
expected.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10496
Fixes: ee4d51f8b2 ("v3d: Add a lowering pass for line smoothing")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28072>
2024-03-11 12:42:50 +00:00
Juan A. Suarez Romero
62e1dff256 v3d: add load_fep_w_v3d intrinsic
This intrinsic helps to read the W coordinate stored in the QPU register
when initializing the input data for the fragment shaders.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28072>
2024-03-11 12:42:49 +00:00
Juan A. Suarez Romero
7eae0e03f1 broadcom/compiler: fix SFU check for 7.1
Avoid SFU op when the result would land in other thread.

total instructions in shared programs: 691634 -> 691928 (0.04%)
instructions in affected programs: 44888 -> 45182 (0.65%)
helped: 17
HURT: 211
helped stats (abs) min: 1 max: 2 x̄: 1.06 x̃: 1
helped stats (rel) min: 0.19% max: 1.96% x̄: 0.74% x̃: 0.71%
HURT stats (abs)   min: 1 max: 8 x̄: 1.48 x̃: 1
HURT stats (rel)   min: 0.06% max: 14.29% x̄: 2.15% x̃: 1.11%
95% mean confidence interval for instructions value: 1.14 1.44
95% mean confidence interval for instructions %-change: 1.62% 2.24%
Instructions are HURT.

total max-temps in shared programs: 133794 -> 133804 (<.01%)
max-temps in affected programs: 237 -> 247 (4.22%)
helped: 0
HURT: 10
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 1.72% max: 12.50% x̄: 6.52% x̃: 4.63%
95% mean confidence interval for max-temps value: 1.00 1.00
95% mean confidence interval for max-temps %-change: 3.37% 9.66%
Max-temps are HURT.

total sfu-stalls in shared programs: 818 -> 766 (-6.36%)
sfu-stalls in affected programs: 164 -> 112 (-31.71%)
helped: 79
HURT: 26
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 10.00% max: 100.00% x̄: 76.52% x̃: 100.00%
HURT stats (abs)   min: 1 max: 2 x̄: 1.04 x̃: 1
HURT stats (rel)   min: 0.00% max: 100.00% x̄: 12.18% x̃: 0.00%
95% mean confidence interval for sfu-stalls value: -0.67 -0.32
95% mean confidence interval for sfu-stalls %-change: -64.00% -45.11%
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 692452 -> 692694 (0.03%)
inst-and-stalls in affected programs: 36509 -> 36751 (0.66%)
helped: 9
HURT: 181
helped stats (abs) min: 1 max: 2 x̄: 1.11 x̃: 1
helped stats (rel) min: 0.19% max: 1.96% x̄: 0.74% x̃: 0.71%
HURT stats (abs)   min: 1 max: 8 x̄: 1.39 x̃: 1
HURT stats (rel)   min: 0.06% max: 6.25% x̄: 1.57% x̃: 1.09%
95% mean confidence interval for inst-and-stalls value: 1.12 1.43
95% mean confidence interval for inst-and-stalls %-change: 1.27% 1.66%
Inst-and-stalls are HURT.

total nops in shared programs: 25075 -> 25154 (0.32%)
nops in affected programs: 529 -> 608 (14.93%)
helped: 3
HURT: 76
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 5.26% max: 50.00% x̄: 21.75% x̃: 10.00%
HURT stats (abs)   min: 1 max: 6 x̄: 1.08 x̃: 1
HURT stats (rel)   min: 1.04% max: 200.00% x̄: 48.08% x̃: 50.00%
95% mean confidence interval for nops value: 0.84 1.16
95% mean confidence interval for nops %-change: 36.45% 54.40%
Nops are HURT.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28009>
2024-03-11 10:23:48 +00:00
Iago Toral Quiroga
cc7934a89b broadcom/compiler: fix lane selection for subgroups in fragment shaders
It seems the hardware behavior for this is as per-spec and we are
supposed to identify as active entire quads. Particularly, there
are some derivative tests with dynamic control flow that use
subgroup ballot and require this.

However, we still need to exclude terminted lanes (OpTerminate). For
that, we keep track of the sample mask at the start of a fagment
shader start and compare it with the current sample mask.

Fixes: ('broadcom/compiler: support subgroup reduction operations from fragment shaders')
Fixes: dEQP-VK.glsl.derivate.dynamic_loop.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27409>
2024-02-14 08:02:41 +01:00
Iago Toral Quiroga
e5bfce6f46 broadcom/compiler: support subgroup reduction operations from fragment shaders
In fragment shaders these instructions consider a lane active when
any lane in the same quad is active, which is not what we want, so
we need to include the current sample mask in the condition mask
used with these instructions to limit lane selection to those that
are really active.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>
2024-01-31 10:06:06 +00:00
Iago Toral Quiroga
3544113ee2 brodcom/compiler: implement non-compute TSY barrier
The BARRIER_ID instruction is only available in compute
and tessellation, implement an equivalent barrier that we
can use from other stages.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>
2024-01-31 10:06:06 +00:00
Iago Toral Quiroga
5b269814fc broadcom/compiler: be more careful with unifa in non-uniform control flow
If the lane from which the hardware writes the unifa address
is disabled, then we may end up with a bogus address and invalid
memory accesses from follow-up ldunifa.

Instead of always disabling unifa loads in non-uniform control
flow we can try to see if the address is prouced from a nir
register (which is the only case where we do conditional writes
under non-uniform control flow in ntq_store_def), and only
disable it in that case.

When enabling subgroups for graphics pipelines, this fixes a
GMP violation in the simulator with the following test
(which has non-uniform control flow writing unifa with lane 0
disabled, which is the lane from which the unifa takes the
address):
dEQP-VK.subgroups.ballot_broadcast.graphics.subgroupbroadcastfirst_int

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>
2024-01-31 10:06:06 +00:00