fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-02-15 06:40:27 +01:00

Author	SHA1	Message	Date
Iago Toral Quiroga	cc7934a89b	broadcom/compiler: fix lane selection for subgroups in fragment shaders It seems the hardware behavior for this is as per-spec and we are supposed to identify as active entire quads. Particularly, there are some derivative tests with dynamic control flow that use subgroup ballot and require this. However, we still need to exclude terminted lanes (OpTerminate). For that, we keep track of the sample mask at the start of a fagment shader start and compare it with the current sample mask. Fixes: ('broadcom/compiler: support subgroup reduction operations from fragment shaders') Fixes: dEQP-VK.glsl.derivate.dynamic_loop.* Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27409>	2024-02-14 08:02:41 +01:00
Iago Toral Quiroga	e5bfce6f46	broadcom/compiler: support subgroup reduction operations from fragment shaders In fragment shaders these instructions consider a lane active when any lane in the same quad is active, which is not what we want, so we need to include the current sample mask in the condition mask used with these instructions to limit lane selection to those that are really active. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>	2024-01-31 10:06:06 +00:00
Iago Toral Quiroga	3544113ee2	brodcom/compiler: implement non-compute TSY barrier The BARRIER_ID instruction is only available in compute and tessellation, implement an equivalent barrier that we can use from other stages. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>	2024-01-31 10:06:06 +00:00
Iago Toral Quiroga	5b269814fc	broadcom/compiler: be more careful with unifa in non-uniform control flow If the lane from which the hardware writes the unifa address is disabled, then we may end up with a bogus address and invalid memory accesses from follow-up ldunifa. Instead of always disabling unifa loads in non-uniform control flow we can try to see if the address is prouced from a nir register (which is the only case where we do conditional writes under non-uniform control flow in ntq_store_def), and only disable it in that case. When enabling subgroups for graphics pipelines, this fixes a GMP violation in the simulator with the following test (which has non-uniform control flow writing unifa with lane 0 disabled, which is the lane from which the unifa takes the address): dEQP-VK.subgroups.ballot_broadcast.graphics.subgroupbroadcastfirst_int Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>	2024-01-31 10:06:06 +00:00
Iago Toral Quiroga	93df9800e8	broadcom/compiler: support subgroup quad These can all be implemented as specialized shuffles, so we use the NIR lowering to do that. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>	2024-01-31 10:06:06 +00:00
Iago Toral Quiroga	69d3b90839	broadcom/compiler: support subgroup vote We don't rely in any lowerings for these (other than scalarization). The only noteworthy aspect is that these instructions, like ballot, use the condition mask to filter out valid invocations that are inactive because of control flow. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>	2024-01-31 10:06:06 +00:00
Iago Toral Quiroga	3222fd71a1	broadcom/compiler: support subgroup shuffle This maps to our native shuffle instruction. For shuffle relative and shuffle xor, we rely on the nir lowering to lower this to ALU and regular shuffle. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>	2024-01-31 10:06:06 +00:00
Iago Toral Quiroga	29a5e3e615	broadcom/compiler: support subgroup ballot This adds support in our compiler for the subgroup ballot feature. To this end we start using the NIR lowering for subgroups which can lowers some of these intrinsics into things more amenable to our hardware and takes care of scalarization. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>	2024-01-31 10:06:06 +00:00
Iago Toral Quiroga	295f906517	broadcom/compiler: don't move subgroup reduction instructions above setmsf These use the sample mask to decide about active lanes, so we need to make sure we don't move them above a previous setmsf instruction. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>	2024-01-31 10:06:06 +00:00
Iago Toral Quiroga	9bbfbc2089	broadcom/compiler: add new SFU instructions in V3D 7.x Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>	2024-01-31 10:06:06 +00:00
Iago Toral Quiroga	7bdc8898b1	broadcom/compiler: fix incorrect flags update for subgroup elect c->execute is 0 (not the block index) for lanes currently active under non-uniform control flow. Also this simplifies a bit the instructions we emit for flag generation, both for uniform and non-uniform control flow. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>	2024-01-31 10:06:06 +00:00
Iago Toral Quiroga	29d4924e5e	broadcom/compiler: fix incorrect flags setup in non-uniform if path If the ELSE block is cheap then we don't emit the branch instruction but we still want to generate the flags, since these are setting the flags for the THEN block too. Fixes: `e401add741` ("broadcom/compiler: skip jumps in non-uniform if/then when block cost is small") Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>	2024-01-31 10:06:06 +00:00
Alejandro Piñeiro	ffd0e3a7fe	broadcom/compiler: fix coverity warning (unitialized pointer read) Full coverity warning: CID 1558604: Uninitialized pointer read (UNINIT)12. uninit_use_in_call: Using uninitialized value *results when calling nir_vec. 236 return nir_vec(b, results, DIV_ROUND_UP(num_components, 2)); To fix it we initialize the variables, provide a unreachable on the switch that sets the results values. As we are here we also move a comment to make things more clear. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26951>	2024-01-22 16:46:57 +01:00
Iago Toral Quiroga	5c42d6c62f	v3dv: implement VK_EXT_shader_demote_to_helper_invocation Demoting means that we don't execute any writes to memory but otherwise the invocation continues to execute. Particularly, subgroup operations and derivatives must work. Our implementation of discard does exactly this by using setmsf to prevent writes for the affected invocations, the only difference for us is that with discard/terminate we want to be more careful with emitting quad loads for tmu operations, since the invocations are not supposed to be running any more and load offsets may not be valid, but with demote the invocations are not terminated and thus we should emit memory reads for them to ensure quad operations and derivatives from invocations that have not been demoted still work. Since we use the sample mask to implement demotes we can't tell whether a particular helper invocation was originally such (gl_HelperInvocation in GLSL) or was later demoted (OpIsHelperInvocationEXT added with SPV_EXT_demote_to_helper_invocation), so we use nir_lower_is_helper_invocation to take care of this. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26949>	2024-01-09 13:22:37 +00:00
Alejandro Piñeiro	54e2e44f99	broadcom/compiler: remove one superfluous call to nir_opt_undef v3d_optimize_nir is calling nir_opt_undef twice. As it is inside the usual "do {..} while (progress);" loop, is not needed to call it twice. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26928>	2024-01-09 12:02:36 +01:00
Daniel Schürmann	a3ed36da1a	treewide: replace calls to nir_opt_trivial_continues() with nir_opt_loop() Totals from 850 (1.11% of 76636) affected shaders: (RADV, GFX11) MaxWaves: 18134 -> 18130 (-0.02%) Instrs: 3011298 -> 3008585 (-0.09%); split: -0.17%, +0.08% CodeSize: 15836804 -> 15841972 (+0.03%); split: -0.09%, +0.12% VGPRs: 63580 -> 63604 (+0.04%) SpillSGPRs: 966 -> 1148 (+18.84%); split: -0.83%, +19.67% Latency: 36102291 -> 30186144 (-16.39%); split: -16.41%, +0.02% InvThroughput: 9058100 -> 7011821 (-22.59%); split: -22.61%, +0.02% VClause: 65369 -> 65364 (-0.01%); split: -0.03%, +0.02% SClause: 100309 -> 100305 (-0.00%); split: -0.04%, +0.04% Copies: 335658 -> 336472 (+0.24%); split: -0.70%, +0.94% Branches: 110806 -> 108945 (-1.68%); split: -1.94%, +0.26% PreSGPRs: 73476 -> 73934 (+0.62%); split: -0.25%, +0.87% PreVGPRs: 58809 -> 58840 (+0.05%); split: -0.01%, +0.06% Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24940>	2024-01-03 20:48:04 +00:00
Iago Toral Quiroga	5057eb90a1	v3dv: implement VK_KHR_shader_terminate_invocation The semantics for this matches those of discard. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26683>	2023-12-15 16:35:50 +00:00
Iago Toral Quiroga	d0f75fdeab	broadcom: lower null pointers We only support the variablePointersStorageBuffer feature of variable pointers, which basically ensures that pointers may only target one buffer. This means that a particular pointer may change where it points within a given buffer but it cannot change its value to point to some other buffer. This is a requirement from us since we expect buffer indices on buffer loads and stores to be constant, so we can't have a buffer load come through a pointer that may be assigned to different buffers, since in that case the buffer index would need to come from bcsel. There is, however, a small complication: the spec still allows pointers to be null, and NIR defines null pointers to use 0xffffffff for both the buffer index and the offset, which will cause a problem in a scenario like this: int *b = ... if (cond) { b = null; discard; } ubo_load(b); Here the buffer index for the ubo load may come from a bcsel choosing between the null pointer (0xffffffff) and the valid address (let's say 0), so we don't have a constant and we assert fail. This change detects this scenario and upon finding it will rewrite the buffer index on the null pointer branch of the bcsel to match that of the valid branch so that later optimizations passes can remove the bcsel and we end up with a constant index. This is fine because a null pointer dereference is undefined behavior and it is not something we should see in valid applications. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26683>	2023-12-15 16:35:50 +00:00
Iago Toral Quiroga	716847a77d	broadcom: disable perquad tmu loads after discards Otherwise we may emit a load from an invalid offset from a lane that was discarded. This fixes an simulator assert from triggering when executing: dEQP-VK.spirv_assembly.instruction.terminate_invocation.terminate.no_null_pointer_load That test emits a conditional kill and then a buffer load which would have invalid offsets for the lines killed. Since the buffer load is in uniform control flow we were incorrectly emitting a full quad load, including disabled lanes which would prompt the simulator to assert on invalid offsets being loaded coming from the lanes that had been killed in the shader. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26683>	2023-12-15 16:35:50 +00:00
Iago Toral Quiroga	6b89c71c90	broadcom: fix scheduling dependencies for SETMSF instruction We use SETMSF to implement discard, so we need to ensure that any TMU writes after a SETMSF don't actually execute. We emit a TMU flush before a discard but we also need to ensure that the QPU scheduler honors this. Fixes some tests in dEQP-VK.spirv_assembly.instruction.terminate_invocation.* when we expose the extension that would otherwise fail because the QPU scheduler would incorrectly move some image writes emitted after a SETMSF before the SETMSF instruction. Also fixes spec@arb_shader_atomic_counters@fragment-discard Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26631>	2023-12-12 12:58:42 +00:00
Yonggang Luo	8fa16452ba	broadcom/compiler: remove include of gallium headers from meson.build Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26579>	2023-12-12 10:03:11 +00:00
Yonggang Luo	238a9ef5ff	broadcom/(compiler,common): avoid include of gallium headers in header files Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26579>	2023-12-12 10:03:10 +00:00
Yonggang Luo	e5ebd59dd5	broadcom: remove unused headers include Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26571>	2023-12-07 17:22:47 +00:00
Yonggang Luo	35133551e1	broadcom/compiler: remove unused blend in v3d_fs_key Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26571>	2023-12-07 17:22:47 +00:00
Yonggang Luo	575c4f6802	broadcom/compiler: Use correct type pipe_logicop for logicop_func in struct v3d_fs_key Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26570>	2023-12-07 16:36:50 +00:00
Alejandro Piñeiro	8191acd41e	broadcom/compiler: update image store lowering to use v71 new packing/conversion instructions Vulkan shaderdb stats with pattern dEQP-VK.image..with_format..*: total instructions in shared programs: 35993 -> 33245 (-7.63%) instructions in affected programs: 21153 -> 18405 (-12.99%) helped: 394 HURT: 1 Instructions are helped. total uniforms in shared programs: 8550 -> 7418 (-13.24%) uniforms in affected programs: 5136 -> 4004 (-22.04%) helped: 399 HURT: 0 Uniforms are helped. total max-temps in shared programs: 6014 -> 5905 (-1.81%) max-temps in affected programs: 473 -> 364 (-23.04%) helped: 58 HURT: 0 Max-temps are helped. total nops in shared programs: 1515 -> 1504 (-0.73%) nops in affected programs: 46 -> 35 (-23.91%) helped: 14 HURT: 2 Inconclusive result (%-change mean confidence interval includes 0). FWIW, that one HURT on the instructions count is for just one instruction. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25726>	2023-11-20 08:20:31 +00:00
Maíra Canal	4d95b4861e	v3dv: implement VK_EXT_multi_draw Implement the Vulkan extension VK_EXT_multi_draw. It was tested with deqp-vk -n dEQP-VK.draw.multi_draw. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26138>	2023-11-14 06:20:21 +00:00
Faith Ekstrand	2db20af82e	v3d: Stop assuming glsl_get_length() returns 0 for vectors Checking for whether or not it's a plain vector is actually what we want anyway. There's no point in handling arays of length 1. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22580>	2023-11-02 20:28:46 +00:00
Alejandro Piñeiro	e9fa6c0bc6	broadcom/compiler: set properly lod query Acked-by: Emma Anholt <emma@anholt.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25851>	2023-11-02 11:59:08 +01:00
Alejandro Piñeiro	85f26828fe	broadcom: only support v42 and v71 Acked-by: Emma Anholt <emma@anholt.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25851>	2023-11-02 11:59:08 +01:00
Alejandro Piñeiro	c24a635d1c	broadcom/compiler: add v3d_pack_unnormalized_coordinates helper So far we were packing by hand unnormalized coordinates at the V3D41_TMU_CONFIG_PARAMETER_1 pack structure. To get this working we hardcoded V3D_VERSION to 41 at v3dv_uniforms, that works for v71 because the structure are the same. But that is somewhat ugly, and will not work if a new hw generation have a different structure. Additionally, we found that for v3d this will be also needed. So this commit adds a helper on the compiler. For now, and to simplify it also use just one method for both generations. This solves the problem of the same code needed on both v3d and v3dv. But the idea is that in the future we need a similar need, but the structure different on each generation, it would have used a similar approach to other generation dependent function calls (like v3d40_vir_emit_tex), having the implementation on a source file that can safely include the hw generation headers. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Eric Engestrom <eric@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25544>	2023-10-31 13:00:34 +01:00
Iago Toral Quiroga	9e90d95508	v3d,v3dv: support up to 8 render targets in v7.1+ Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>	2023-10-13 22:37:43 +00:00
Alejandro Piñeiro	452421dfe5	v3dv: no specific separate_segments flag for V3D 7.1 On V3D 7.1 there is not a flag on the Shader State Record to specify if we are using shared or separate segments. This is done by setting the vpm input size to 0 (so we need to ensure that the output would be the max needed for input/output). We were already doing the latter on the prog_data_vs, so we just need to use those values, instead of assigning default values. As we are here, we also add some comments on the compiler part. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>	2023-10-13 22:37:43 +00:00
Iago Toral Quiroga	8c191d1103	broadcom/compiler: update thread end restrictions validation for v71 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>	2023-10-13 22:37:43 +00:00
Iago Toral Quiroga	1f5a3391bb	broadcom/compiler: only assign rf0 as last resort in V3D 7.x So we can use it for ldunif(a) and avoid generating ldunif(a)rf which can't be paired with conditional instructions. shader-db (pi5): total instructions in shared programs: 11357802 -> 11338883 (-0.17%) instructions in affected programs: 7117889 -> 7098970 (-0.27%) helped: 24264 HURT: 17574 Instructions are helped. total uniforms in shared programs: 3857808 -> 3857815 (<.01%) uniforms in affected programs: 92 -> 99 (7.61%) helped: 0 HURT: 1 total max-temps in shared programs: 2230904 -> 2230199 (-0.03%) max-temps in affected programs: 52309 -> 51604 (-1.35%) helped: 1219 HURT: 725 Max-temps are helped. total sfu-stalls in shared programs: 15021 -> 15236 (1.43%) sfu-stalls in affected programs: 6848 -> 7063 (3.14%) helped: 1866 HURT: 1704 Inconclusive result total inst-and-stalls in shared programs: 11372823 -> 11354119 (-0.16%) inst-and-stalls in affected programs: 7149177 -> 7130473 (-0.26%) helped: 24315 HURT: 17561 Inst-and-stalls are helped. total nops in shared programs: 273624 -> 273711 (0.03%) nops in affected programs: 31562 -> 31649 (0.28%) helped: 1619 HURT: 1854 Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>	2023-10-13 22:37:42 +00:00
Iago Toral Quiroga	c8e4ee8ecb	broadcom/compiler: don't assign registers to unused nodes/temps In programs with a lot of unused temps, if we don't do this, we may end up recycling previously used rfs more often, which can be detrimental to instruction pairing. total instructions in shared programs: 11464335 -> 11444136 (-0.18%) instructions in affected programs: 8976743 -> 8956544 (-0.23%) helped: 33196 HURT: 33778 Inconclusive result total max-temps in shared programs: 2230150 -> 2229445 (-0.03%) max-temps in affected programs: 86413 -> 85708 (-0.82%) helped: 2217 HURT: 1523 Max-temps are helped. total sfu-stalls in shared programs: 18077 -> 17104 (-5.38%) sfu-stalls in affected programs: 8669 -> 7696 (-11.22%) helped: 2657 HURT: 2182 Sfu-stalls are helped. total inst-and-stalls in shared programs: 11482412 -> 11461240 (-0.18%) inst-and-stalls in affected programs: 8995697 -> 8974525 (-0.24%) helped: 33319 HURT: 33708 Inconclusive result total nops in shared programs: 298140 -> 296185 (-0.66%) nops in affected programs: 52805 -> 50850 (-3.70%) helped: 3797 HURT: 2662 Inconclusive result Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>	2023-10-13 22:37:42 +00:00
Iago Toral Quiroga	ce13aa4ee7	broadcom/compiler: improve allocation for final program instructions The last 3 instructions can't use specific registers so flag all the nodes for temps used in the last program instructions and try to avoid assigning any of these. This may help us avoid injecting nops for the last thread switch instruction. Because regisster allocation needs to happen before QPU scheduling and instruction merging we can't tell exactly what the last 3 instructions will be, so we do this for a few more instructions than just 3. We only do this for fragment shaders because other shader stages always end with VPM store instructions that take an small immediate and therefore will never allow us to merge the final thread switch earlier, so limiting allocation for these shaders will never improve anything and might instead be detrimental. total instructions in shared programs: 11471389 -> 11464335 (-0.06%) instructions in affected programs: 582908 -> 575854 (-1.21%) helped: 4669 HURT: 578 Instructions are helped. total max-temps in shared programs: 2230497 -> 2230150 (-0.02%) max-temps in affected programs: 5662 -> 5315 (-6.13%) helped: 344 HURT: 44 Max-temps are helped. total sfu-stalls in shared programs: 18068 -> 18077 (0.05%) sfu-stalls in affected programs: 264 -> 273 (3.41%) helped: 37 HURT: 48 Inconclusive result (value mean confidence interval includes 0). total inst-and-stalls in shared programs: 11489457 -> 11482412 (-0.06%) inst-and-stalls in affected programs: 585180 -> 578135 (-1.20%) helped: 4659 HURT: 588 Inst-and-stalls are helped. total nops in shared programs: 301738 -> 298140 (-1.19%) nops in affected programs: 14680 -> 11082 (-24.51%) helped: 3252 HURT: 108 Nops are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>	2023-10-13 22:37:42 +00:00
Iago Toral Quiroga	818fc41e7e	broadcom/compiler: don't allocate spill base to rf0 in V3D 7.x Otherwise it can be stomped by instructions doing implicit rf0 writes. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>	2023-10-13 22:37:42 +00:00
Iago Toral Quiroga	84c912c1d4	broadcom/compiler: fix up copy propagation for v71 Update rules for unsafe copy propagations to match v7.x. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>	2023-10-13 22:37:42 +00:00
Iago Toral Quiroga	1e85be415a	broadcom/compiler: lift restriction on vpmwt in last instruction for V3D 7.x Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>	2023-10-13 22:37:42 +00:00
Iago Toral Quiroga	2774601780	broadcom/compiler: validate restrictions after TLB Z write Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>	2023-10-13 22:37:42 +00:00
Iago Toral Quiroga	d4285d7f2a	broadcom/compiler: start allocating from RF 4 in V7.x In V3D 4.x we start at RF3 so that we allocate RF0-2 only if there aren't any other RFs available. This is useful with small shaders to ensure that our TLB writes don't use these registers because these are the last instructions we emit in fragment shaders and the last instructions in a program can't write to these registers, so if we do, we need to emit NOPs. In V3D 7.x the registers affected by this restriction are RF2-3, so we choose to start at RF4. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>	2023-10-13 22:37:42 +00:00
Iago Toral Quiroga	2b39bb35c5	broadcom/compiler: lift restriction for branch + msfign after setmsf for v7.x Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>	2023-10-13 22:37:42 +00:00
Iago Toral Quiroga	5e9b405aa7	broadcom/compiler: update ldvary thread switch delay slot restriction for v7.x In V3D 7.x we don't have accumulators which would not survive a thread switch, so the only restriction is that ldvary can't be placed in the second delay slot of a thread switch. shader-db results for UnrealEngine4 shaders: total instructions in shared programs: 446458 -> 446401 (-0.01%) instructions in affected programs: 13492 -> 13435 (-0.42%) helped: 58 HURT: 3 Instructions are helped. total nops in shared programs: 19571 -> 19541 (-0.15%) nops in affected programs: 161 -> 131 (-18.63%) helped: 30 HURT: 0 Nops are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>	2023-10-13 22:37:42 +00:00
Iago Toral Quiroga	526c1889e5	broadcom/compiler: update thread end restrictions for v7.x In 4.x it is not allowed to write to the register file in the last 3 instructions, but in 7.x we only have this restriction in the thread end instruction itself, and only if the write comes from the ALU ports. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>	2023-10-13 22:37:42 +00:00
Iago Toral Quiroga	ced83e7803	broadcom/compiler: implement small immediates for v71 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>	2023-10-13 22:37:42 +00:00
Iago Toral Quiroga	e4d30600a4	broadcom/compiler: convert mul to add when needed to allow merge V3D 7.x added 'mov' opcodes to the ADD alu, so now it is possible to move these to the ADD alu to facilitate merging them with other MUL instructions. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>	2023-10-13 22:37:42 +00:00
Iago Toral Quiroga	cbedf14687	broadcom/compiler: don't assign rf0 to temps that conflict with ldvary ldvary writes to rf0 implicitly, so we don't want to allocate rf0 to any temps that are live across ldvary's rf0 live ranges. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>	2023-10-13 22:37:42 +00:00
Iago Toral Quiroga	3a36a618d7	broadcom/compiler: try to use ldunif(a) instead of ldunif(a)rf in v71 The rf variants need to encode the destination in the cond bits, which prevents these to be merged with any other instruction that need them. In 4.x, ldunif(a) write to r5 which is a special register that only ldunif(a) and ldvary can write so we have a special register class for it and only allow it for them. Then when we need to choose a register for a node, if this register is available we always use it. In 7.x these instructions write to rf0, which can be used by any instruction, so instead of restricting rf0, we track the temps that are used as ldunif(a) destinations and use that information to favor rf0 for them. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>	2023-10-13 22:37:42 +00:00
Iago Toral Quiroga	d8a25bdb07	broadcom/compiler: enable ldvary pipelining on v71 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>	2023-10-13 22:37:42 +00:00

1 2 3 4 5 ...

840 commits