Commit graph

859 commits

Author SHA1 Message Date
Iago Toral Quiroga
865e682ad7 broadcom/compiler: apply payload conflict to spill setup before RA
We can emit spill setup before RA if we use scratch. In that case
we have the same situation as during spilling, with the caveat that
we have already emitted the instructions so we need to find them
(they should be the only instructions ones before the instructions
accessing payload registers) and flag them as such.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29343>
2024-05-24 05:25:22 +00:00
Iago Toral Quiroga
cb83f25b39 broadcom/compiler: don't assign payload registers to spilling setup temps
We read our payload registers first in the shader so we generally don't have
to care about temps being allocated to them and stomping their value before
we can read them. Hoewer, spilling setup instructions are an exception since
these will be inserted first when there is any spilling in the program.
To fix this, we flag RA nodes involved with these instructions so we can
then try to avoid assiging these registers to them.

Fixes CTS failures with V3D_DEBUG=opt_compile_time, particularly:
dEQP-VK.binding_model.buffer_device_address.set0.depth2.basessbo.convertcheckuv2.nostore.single.std140.comp_offset_nonzero

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29343>
2024-05-24 05:25:22 +00:00
Iago Toral Quiroga
901c485997 broadcom/compiler: make add_node return the node index
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29343>
2024-05-24 05:25:21 +00:00
Juan A. Suarez Romero
87cd11ecd2 nir,v3d: rename tlb_color_v3d intrinsic
As this is intended to be used also by VC4, change the suffix to
something more convenient, like tlb_color_brcm.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29119>
2024-05-13 10:44:17 +00:00
Juan A. Suarez Romero
920025533e broadcom/compiler: do not run lowering I/O for FS
This lowering is applied only for vertex and geometry shaders. So detect
earlier this situation and do not go ahead with other shader.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29102>
2024-05-09 07:55:00 +00:00
Iago Toral Quiroga
1545dc94b4 broadcom/compiler: simplify v3d_vir_emit_tex
In the past where backends had to deal with nir_register we needed
a specific path for them here because nir_def_components_read only
worked on ssa defs, but now that we got rid of nir_register we
only have nir_def and we don't need to go out of our way to do this
and we can just always use nir_def_components_read.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28978>
2024-05-09 09:29:44 +02:00
Iago Toral Quiroga
c24a149d2d broadcom/compiler: don't read excess channels on image loads
This is similar to what we do for textures where we program a number
of channels matching the number of componentes actually read by the
shader.

Makes tests like dEQP-VK.image.load_store.with_format.2d.r32_uint
drop from 18 instructions to 14 by emitting a single ldtmu instead
of 4.

From dEQP-VK.image.load_store.*:

total instructions in shared programs: 12681 -> 12093 (-4.64%)
instructions in affected programs: 4866 -> 4278 (-12.08%)
helped: 256
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 2.30 x̃: 2
helped stats (rel) min: 4.76% max: 25.00% x̄: 12.35% x̃: 11.11%
95% mean confidence interval for instructions value: -2.40 -2.19
95% mean confidence interval for instructions %-change: -12.99% -11.71%
Instructions are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28978>
2024-05-09 09:29:44 +02:00
Iago Toral Quiroga
ae7f20d8d4 broadcom/compiler: assert on array overflow
If a shader has too may outputs overflowing the array may
overwrite other pieces of state, which can then be tricky to
debug.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28978>
2024-05-09 09:29:44 +02:00
Juan A. Suarez Romero
9f72e22230 broadcom/compiler: remove unused parameters in vpm read
A few of the parameters are not actually used at all. So let's clean
them.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29101>
2024-05-08 14:13:07 +00:00
Marek Olšák
1632948a76 nir: validate src_type of store_output intrinsics, require bit_size >= 16
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28845>
2024-05-01 19:41:35 +00:00
Karol Herbst
d22f936019 nir: remove workgroup_id_zero_base
This removes the need for drivers to handle both versions. The base will
get added once in nir_lower_system_values when converting from deref to
intrinsic and will be replaced by a zero for users not supporting it.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800>
2024-04-24 20:18:49 +00:00
Iago Toral Quiroga
1070c9b0e7 broadcom/compiler: enable perquad with uses_wide_subgroup_intrinsics
This fixes a number of regressions in Vulkan subgroups tests in CTS.

Fixes: 97f5721bfc ('broadcom/compiler: needs_quad_helper_invocation enable PER_QUAD TMU access')
cc: mesa-stable

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28797>
2024-04-18 09:00:56 +00:00
Jose Maria Casanova Crespo
97f5721bfc broadcom/compiler: needs_quad_helper_invocation enable PER_QUAD TMU access
We take advantage of the needs_quad_helper_invocation information to
only enable the PER_QUAD TMU access on Fragment Shaders when it is needed.

PER_QUAD access is also disabled on stages different to fragment shader.
Being enabled was causing MMU errors when TMU was doing indexed by vertexid
reads on disabled lanes on vertex stage. This problem was exercised by some
shaders from the GTK new GSK_RENDERER=ngl that were accessing a constant buffer
offset[6], but having PER_QUAD enabled on the TMU access by VertexID was
doing hidden incorrect access to not existing vertex 6 and 7 as TMU was
accessing the full quad.

cc: mesa-stable

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28740>
2024-04-16 08:20:54 +00:00
Mike Blumenkrantz
7760642d2e v3d: set use_clipdist_array=true for lower_clip?
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28308>
2024-04-11 18:57:25 +00:00
Iago Toral Quiroga
9fad2922fb broadcom/compiler: fix workaround for GFXH-1602
In this scenario drivers are adding a dummy attribute with a size
of 1, so we should account for it here.

Fixes missing window decorations with GTK4+.

Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10853
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28414>
2024-04-01 09:22:39 +00:00
Yonggang Luo
1ac1c0843f treewide: Replace usage of macro DEBUG with MESA_DEBUG when possible
This is achieved by the following steps:

#ifndef DEBUG => #if !MESA_DEBUG
defined(DEBUG) => MESA_DEBUG
#ifdef DEBUG => #if MESA_DEBUG

This is done by replace in vscode

excludes
docs,*.rs,addrlib,src/imgui,*.sh,src/intel/vulkan/grl/gpu

These are safe because those files should keep DEBUG macro is already excluded;
and not directly replace DEBUG, as we have some symbols around it.

Use debug or NDEBUG instead of DEBUG in comments when proper

This for reduce the usage of DEBUG,
so it's easier migrating to MESA_DEBUG

These are found when migrating DEBUG to MESA_DEBUG,
these are all comment update, so it's safe

Replace comment /* DEBUG */ and /* !DEBUG */ with proper /* MESA_DEBUG */ or /* !MESA_DEBUG */ manually

DEBUG || !NDEBUG -> MESA_DEBUG || !NDEBUG
!DEBUG && NDEBUG -> !(MESA_DEBUG || !NDEBUG)

Replace the DEBUG present in comment with proper new MESA_DEBUG manually

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28092>
2024-03-22 18:22:34 +00:00
Juan A. Suarez Romero
69fbd5cb90 v3d: fix line coords with perspective projection
The algorithm used to rendering smooth lines worked under the assumption
that line coords were in the [0, 1] range. This was correct when using
an orthogonal projection, but not when using a perspective projection.

With a perspective projection (where the value for 1/Wc set in the VPM
is not 1.0), line coords values are also affected by this projection, so
the values are not in this range.

To deal with this, we normalize the line coords using the Wc value so
the range becomes [0, 1], and the smooth line rendering works as
expected.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10496
Fixes: ee4d51f8b2 ("v3d: Add a lowering pass for line smoothing")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28072>
2024-03-11 12:42:50 +00:00
Juan A. Suarez Romero
62e1dff256 v3d: add load_fep_w_v3d intrinsic
This intrinsic helps to read the W coordinate stored in the QPU register
when initializing the input data for the fragment shaders.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28072>
2024-03-11 12:42:49 +00:00
Juan A. Suarez Romero
7eae0e03f1 broadcom/compiler: fix SFU check for 7.1
Avoid SFU op when the result would land in other thread.

total instructions in shared programs: 691634 -> 691928 (0.04%)
instructions in affected programs: 44888 -> 45182 (0.65%)
helped: 17
HURT: 211
helped stats (abs) min: 1 max: 2 x̄: 1.06 x̃: 1
helped stats (rel) min: 0.19% max: 1.96% x̄: 0.74% x̃: 0.71%
HURT stats (abs)   min: 1 max: 8 x̄: 1.48 x̃: 1
HURT stats (rel)   min: 0.06% max: 14.29% x̄: 2.15% x̃: 1.11%
95% mean confidence interval for instructions value: 1.14 1.44
95% mean confidence interval for instructions %-change: 1.62% 2.24%
Instructions are HURT.

total max-temps in shared programs: 133794 -> 133804 (<.01%)
max-temps in affected programs: 237 -> 247 (4.22%)
helped: 0
HURT: 10
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 1.72% max: 12.50% x̄: 6.52% x̃: 4.63%
95% mean confidence interval for max-temps value: 1.00 1.00
95% mean confidence interval for max-temps %-change: 3.37% 9.66%
Max-temps are HURT.

total sfu-stalls in shared programs: 818 -> 766 (-6.36%)
sfu-stalls in affected programs: 164 -> 112 (-31.71%)
helped: 79
HURT: 26
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 10.00% max: 100.00% x̄: 76.52% x̃: 100.00%
HURT stats (abs)   min: 1 max: 2 x̄: 1.04 x̃: 1
HURT stats (rel)   min: 0.00% max: 100.00% x̄: 12.18% x̃: 0.00%
95% mean confidence interval for sfu-stalls value: -0.67 -0.32
95% mean confidence interval for sfu-stalls %-change: -64.00% -45.11%
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 692452 -> 692694 (0.03%)
inst-and-stalls in affected programs: 36509 -> 36751 (0.66%)
helped: 9
HURT: 181
helped stats (abs) min: 1 max: 2 x̄: 1.11 x̃: 1
helped stats (rel) min: 0.19% max: 1.96% x̄: 0.74% x̃: 0.71%
HURT stats (abs)   min: 1 max: 8 x̄: 1.39 x̃: 1
HURT stats (rel)   min: 0.06% max: 6.25% x̄: 1.57% x̃: 1.09%
95% mean confidence interval for inst-and-stalls value: 1.12 1.43
95% mean confidence interval for inst-and-stalls %-change: 1.27% 1.66%
Inst-and-stalls are HURT.

total nops in shared programs: 25075 -> 25154 (0.32%)
nops in affected programs: 529 -> 608 (14.93%)
helped: 3
HURT: 76
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 5.26% max: 50.00% x̄: 21.75% x̃: 10.00%
HURT stats (abs)   min: 1 max: 6 x̄: 1.08 x̃: 1
HURT stats (rel)   min: 1.04% max: 200.00% x̄: 48.08% x̃: 50.00%
95% mean confidence interval for nops value: 0.84 1.16
95% mean confidence interval for nops %-change: 36.45% 54.40%
Nops are HURT.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28009>
2024-03-11 10:23:48 +00:00
Iago Toral Quiroga
cc7934a89b broadcom/compiler: fix lane selection for subgroups in fragment shaders
It seems the hardware behavior for this is as per-spec and we are
supposed to identify as active entire quads. Particularly, there
are some derivative tests with dynamic control flow that use
subgroup ballot and require this.

However, we still need to exclude terminted lanes (OpTerminate). For
that, we keep track of the sample mask at the start of a fagment
shader start and compare it with the current sample mask.

Fixes: ('broadcom/compiler: support subgroup reduction operations from fragment shaders')
Fixes: dEQP-VK.glsl.derivate.dynamic_loop.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27409>
2024-02-14 08:02:41 +01:00
Iago Toral Quiroga
e5bfce6f46 broadcom/compiler: support subgroup reduction operations from fragment shaders
In fragment shaders these instructions consider a lane active when
any lane in the same quad is active, which is not what we want, so
we need to include the current sample mask in the condition mask
used with these instructions to limit lane selection to those that
are really active.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>
2024-01-31 10:06:06 +00:00
Iago Toral Quiroga
3544113ee2 brodcom/compiler: implement non-compute TSY barrier
The BARRIER_ID instruction is only available in compute
and tessellation, implement an equivalent barrier that we
can use from other stages.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>
2024-01-31 10:06:06 +00:00
Iago Toral Quiroga
5b269814fc broadcom/compiler: be more careful with unifa in non-uniform control flow
If the lane from which the hardware writes the unifa address
is disabled, then we may end up with a bogus address and invalid
memory accesses from follow-up ldunifa.

Instead of always disabling unifa loads in non-uniform control
flow we can try to see if the address is prouced from a nir
register (which is the only case where we do conditional writes
under non-uniform control flow in ntq_store_def), and only
disable it in that case.

When enabling subgroups for graphics pipelines, this fixes a
GMP violation in the simulator with the following test
(which has non-uniform control flow writing unifa with lane 0
disabled, which is the lane from which the unifa takes the
address):
dEQP-VK.subgroups.ballot_broadcast.graphics.subgroupbroadcastfirst_int

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>
2024-01-31 10:06:06 +00:00
Iago Toral Quiroga
93df9800e8 broadcom/compiler: support subgroup quad
These can all be implemented as specialized shuffles, so we
use the NIR lowering to do that.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>
2024-01-31 10:06:06 +00:00
Iago Toral Quiroga
69d3b90839 broadcom/compiler: support subgroup vote
We don't rely in any lowerings for these (other than
scalarization). The only noteworthy aspect is that these
instructions, like ballot, use the condition mask to
filter out valid invocations that are inactive because of
control flow.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>
2024-01-31 10:06:06 +00:00
Iago Toral Quiroga
3222fd71a1 broadcom/compiler: support subgroup shuffle
This maps to our native shuffle instruction. For shuffle relative
and shuffle xor, we rely on the nir lowering to lower this to
ALU and regular shuffle.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>
2024-01-31 10:06:06 +00:00
Iago Toral Quiroga
29a5e3e615 broadcom/compiler: support subgroup ballot
This adds support in our compiler for the subgroup ballot
feature. To this end we start using the NIR lowering for
subgroups which can lowers some of these intrinsics into
things more amenable to our hardware and takes care of
scalarization.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>
2024-01-31 10:06:06 +00:00
Iago Toral Quiroga
295f906517 broadcom/compiler: don't move subgroup reduction instructions above setmsf
These use the sample mask to decide about active lanes, so we need
to make sure we don't move them above a previous setmsf instruction.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>
2024-01-31 10:06:06 +00:00
Iago Toral Quiroga
9bbfbc2089 broadcom/compiler: add new SFU instructions in V3D 7.x
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>
2024-01-31 10:06:06 +00:00
Iago Toral Quiroga
7bdc8898b1 broadcom/compiler: fix incorrect flags update for subgroup elect
c->execute is 0 (not the block index) for lanes currently active
under non-uniform control flow.

Also this simplifies a bit the instructions we emit for flag
generation, both for uniform and non-uniform control flow.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>
2024-01-31 10:06:06 +00:00
Iago Toral Quiroga
29d4924e5e broadcom/compiler: fix incorrect flags setup in non-uniform if path
If the ELSE block is cheap then we don't emit the branch instruction
but we still want to generate the flags, since these are setting
the flags for the THEN block too.

Fixes: e401add741 ("broadcom/compiler: skip jumps in non-uniform if/then when block cost is small")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>
2024-01-31 10:06:06 +00:00
Alejandro Piñeiro
ffd0e3a7fe broadcom/compiler: fix coverity warning (unitialized pointer read)
Full coverity warning:

   CID 1558604: Uninitialized pointer read (UNINIT)12. uninit_use_in_call: Using uninitialized value *results when calling nir_vec.
236        return nir_vec(b, results, DIV_ROUND_UP(num_components, 2));

To fix it we initialize the variables, provide a unreachable on the
switch that sets the results values. As we are here we also move a
comment to make things more clear.

Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26951>
2024-01-22 16:46:57 +01:00
Iago Toral Quiroga
5c42d6c62f v3dv: implement VK_EXT_shader_demote_to_helper_invocation
Demoting means that we don't execute any writes to memory but
otherwise the invocation continues to execute. Particularly,
subgroup operations and derivatives must work.

Our implementation of discard does exactly this by using
setmsf to prevent writes for the affected invocations, the
only difference for us is that with discard/terminate we
want to be more careful with emitting quad loads for tmu
operations, since the invocations are not supposed to be
running any more and load offsets may not be valid, but with
demote the invocations are not terminated and thus we should
emit memory reads for them to ensure quad operations and
derivatives from invocations that have not been demoted still
work.

Since we use the sample mask to implement demotes we can't tell
whether a particular helper invocation was originally such
(gl_HelperInvocation in GLSL) or was later demoted
(OpIsHelperInvocationEXT added with SPV_EXT_demote_to_helper_invocation),
so we use nir_lower_is_helper_invocation to take care of this.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26949>
2024-01-09 13:22:37 +00:00
Alejandro Piñeiro
54e2e44f99 broadcom/compiler: remove one superfluous call to nir_opt_undef
v3d_optimize_nir is calling nir_opt_undef twice. As it is inside the
usual "do {..} while (progress);" loop, is not needed to call it
twice.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26928>
2024-01-09 12:02:36 +01:00
Daniel Schürmann
a3ed36da1a treewide: replace calls to nir_opt_trivial_continues() with nir_opt_loop()
Totals from 850 (1.11% of 76636) affected shaders: (RADV, GFX11)
MaxWaves: 18134 -> 18130 (-0.02%)
Instrs: 3011298 -> 3008585 (-0.09%); split: -0.17%, +0.08%
CodeSize: 15836804 -> 15841972 (+0.03%); split: -0.09%, +0.12%
VGPRs: 63580 -> 63604 (+0.04%)
SpillSGPRs: 966 -> 1148 (+18.84%); split: -0.83%, +19.67%
Latency: 36102291 -> 30186144 (-16.39%); split: -16.41%, +0.02%
InvThroughput: 9058100 -> 7011821 (-22.59%); split: -22.61%, +0.02%
VClause: 65369 -> 65364 (-0.01%); split: -0.03%, +0.02%
SClause: 100309 -> 100305 (-0.00%); split: -0.04%, +0.04%
Copies: 335658 -> 336472 (+0.24%); split: -0.70%, +0.94%
Branches: 110806 -> 108945 (-1.68%); split: -1.94%, +0.26%
PreSGPRs: 73476 -> 73934 (+0.62%); split: -0.25%, +0.87%
PreVGPRs: 58809 -> 58840 (+0.05%); split: -0.01%, +0.06%

Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24940>
2024-01-03 20:48:04 +00:00
Iago Toral Quiroga
5057eb90a1 v3dv: implement VK_KHR_shader_terminate_invocation
The semantics for this matches those of discard.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26683>
2023-12-15 16:35:50 +00:00
Iago Toral Quiroga
d0f75fdeab broadcom: lower null pointers
We only support the variablePointersStorageBuffer feature of variable pointers,
which basically ensures that pointers may only target one buffer. This means
that a particular pointer may change where it points within a given buffer but
it cannot change its value to point to some other buffer. This is a requirement
from us since we expect buffer indices on buffer loads and stores to be
constant, so we can't have a buffer load come through a pointer that may
be assigned to different buffers, since in that case the buffer index
would need to come from bcsel.

There is, however, a small complication: the spec still allows pointers to
be null, and NIR defines null pointers to use 0xffffffff for both the buffer
index and the offset, which will cause a problem in a scenario like this:

int *b = ...
if (cond) {
   b = null;
   discard;
}
ubo_load(b);

Here the buffer index for the ubo load may come from a bcsel choosing between
the null pointer (0xffffffff) and the valid address (let's say 0), so we don't
have a constant and we assert fail.

This change detects this scenario and upon finding it will rewrite the buffer
index on the null pointer branch of the bcsel to match that of the valid
branch so that later optimizations passes can remove the bcsel and we end up
with a constant index. This is fine because a null pointer dereference is
undefined behavior and it is not something we should see in valid applications.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26683>
2023-12-15 16:35:50 +00:00
Iago Toral Quiroga
716847a77d broadcom: disable perquad tmu loads after discards
Otherwise we may emit a load from an invalid offset from
a lane that was discarded.

This fixes an simulator assert from triggering when
executing:
dEQP-VK.spirv_assembly.instruction.terminate_invocation.terminate.no_null_pointer_load

That test emits a conditional kill and then a buffer load
which would have invalid offsets for the lines killed. Since
the buffer load is in uniform control flow we were incorrectly
emitting a full quad load, including disabled lanes which would
prompt the simulator to assert on invalid offsets being loaded
coming from the lanes that had been killed in the shader.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26683>
2023-12-15 16:35:50 +00:00
Iago Toral Quiroga
6b89c71c90 broadcom: fix scheduling dependencies for SETMSF instruction
We use SETMSF to implement discard, so we need to ensure that any
TMU writes after a SETMSF don't actually execute. We emit a TMU flush
before a discard but we also need to ensure that the QPU scheduler
honors this.

Fixes some tests in dEQP-VK.spirv_assembly.instruction.terminate_invocation.*
when we expose the extension that would otherwise fail because the
QPU scheduler would incorrectly move some image writes emitted after a SETMSF
before the SETMSF instruction.

Also fixes spec@arb_shader_atomic_counters@fragment-discard

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26631>
2023-12-12 12:58:42 +00:00
Yonggang Luo
8fa16452ba broadcom/compiler: remove include of gallium headers from meson.build
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26579>
2023-12-12 10:03:11 +00:00
Yonggang Luo
238a9ef5ff broadcom/(compiler,common): avoid include of gallium headers in header files
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26579>
2023-12-12 10:03:10 +00:00
Yonggang Luo
e5ebd59dd5 broadcom: remove unused headers include
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26571>
2023-12-07 17:22:47 +00:00
Yonggang Luo
35133551e1 broadcom/compiler: remove unused blend in v3d_fs_key
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26571>
2023-12-07 17:22:47 +00:00
Yonggang Luo
575c4f6802 broadcom/compiler: Use correct type pipe_logicop for logicop_func in struct v3d_fs_key
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26570>
2023-12-07 16:36:50 +00:00
Alejandro Piñeiro
8191acd41e broadcom/compiler: update image store lowering to use v71 new packing/conversion instructions
Vulkan shaderdb stats with pattern dEQP-VK.image.*.with_format.*.*:
   total instructions in shared programs: 35993 -> 33245 (-7.63%)
   instructions in affected programs: 21153 -> 18405 (-12.99%)
   helped: 394
   HURT: 1
   Instructions are helped.

   total uniforms in shared programs: 8550 -> 7418 (-13.24%)
   uniforms in affected programs: 5136 -> 4004 (-22.04%)
   helped: 399
   HURT: 0
   Uniforms are helped.

   total max-temps in shared programs: 6014 -> 5905 (-1.81%)
   max-temps in affected programs: 473 -> 364 (-23.04%)
   helped: 58
   HURT: 0
   Max-temps are helped.

   total nops in shared programs: 1515 -> 1504 (-0.73%)
   nops in affected programs: 46 -> 35 (-23.91%)
   helped: 14
   HURT: 2
   Inconclusive result (%-change mean confidence interval includes 0).

FWIW, that one HURT on the instructions count is for just one
instruction.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25726>
2023-11-20 08:20:31 +00:00
Maíra Canal
4d95b4861e v3dv: implement VK_EXT_multi_draw
Implement the Vulkan extension VK_EXT_multi_draw. It was tested with
deqp-vk -n dEQP-VK.draw.*multi_draw*.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26138>
2023-11-14 06:20:21 +00:00
Faith Ekstrand
2db20af82e v3d: Stop assuming glsl_get_length() returns 0 for vectors
Checking for whether or not it's a plain vector is actually what we want
anyway. There's no point in handling arays of length 1.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22580>
2023-11-02 20:28:46 +00:00
Alejandro Piñeiro
e9fa6c0bc6 broadcom/compiler: set properly lod query
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25851>
2023-11-02 11:59:08 +01:00
Alejandro Piñeiro
85f26828fe broadcom: only support v42 and v71
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25851>
2023-11-02 11:59:08 +01:00
Alejandro Piñeiro
c24a635d1c broadcom/compiler: add v3d_pack_unnormalized_coordinates helper
So far we were packing by hand unnormalized coordinates at the
V3D41_TMU_CONFIG_PARAMETER_1 pack structure. To get this working we
hardcoded V3D_VERSION to 41 at v3dv_uniforms, that works for v71
because the structure are the same. But that is somewhat ugly, and
will not work if a new hw generation have a different structure.

Additionally, we found that for v3d this will be also needed.

So this commit adds a helper on the compiler. For now, and to simplify
it also use just one method for both generations. This solves the
problem of the same code needed on both v3d and v3dv.

But the idea is that in the future we need a similar need, but the
structure different on each generation, it would have used a similar
approach to other generation dependent function calls (like
v3d40_vir_emit_tex), having the implementation on a source file that
can safely include the hw generation headers.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25544>
2023-10-31 13:00:34 +01:00