-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
I perf a revalidator thread, and find __aarch64_cas4_relax take a lot of CPU time.
51.17% revalidator260 libofproto-2.16.so.0.0.2 [.] __aarch64_cas4_relax ◆
8.46% revalidator260 libofproto-2.16.so.0.0.2 [.] ofproto_try_ref ▒
8.41% revalidator260 libofproto-2.16.so.0.0.2 [.] __aarch64_ldadd4_rel ▒
4.91% revalidator260 libopenvswitch-2.16.so.0.0.2 [.] classifier_lookup__ ▒
4.08% revalidator260 libofproto-2.16.so.0.0.2 [.] __aarch64_ldadd4_relax ▒
2.35% revalidator260 libopenvswitch-2.16.so.0.0.2 [.] ccmap_find ▒
2.10% revalidator260 libopenvswitch-2.16.so.0.0.2 [.] cmap_find ▒
1.61% revalidator260 libpthread-2.28.so [.] 0x0000000000014660 ▒
1.51% revalidator260 libpthread-2.28.so [.] 0x00000000000148d0 ▒
1.27% revalidator260 libofproto-2.16.so.0.0.2 [.] __aarch64_ldadd8_relax ▒
0.76% revalidator260 libofproto-2.16.so.0.0.2 [.] do_xlate_actions ▒
0.62% revalidator260 libc-2.28.so [.] 0x000000000010ccb0 ▒
0.34% revalidator260 libopenvswitch-2.16.so.0.0.2 [.] ovs_mutex_lock_at ▒
0.32% revalidator260 libofproto-2.16.so.0.0.2 [.] ukey_lookup.isra.31 ▒
0.32% revalidator260 [kernel.kallsyms] [k] sched_group_set_shares ▒
0.31% revalidator260 libofproto-2.16.so.0.0.2 [.] xlate_table_action ▒
0.30% revalidator260 libc-2.28.so [.] 0x00000000000847f0 ▒
0.26% revalidator260 libc-2.28.so [.] 0x00000000000847e0 ▒
0.25% revalidator260 [kernel.kallsyms] [k] find_vpid ▒
0.24% revalidator260 libc-2.28.so [.] 0x00000000000847f8 ▒
0.24% revalidator260 libc-2.28.so [.] 0x00000000000847e4 ▒
0.22% revalidator260 libc-2.28.so [.] 0x00000000000847ec ▒
0.21% revalidator260 libopenvswitch-2.16.so.0.0.2 [.] cmap_next_position ▒
0.20% revalidator260 libofproto-2.16.so.0.0.2 [.] xlate_push_stats_entry ▒
0.20% revalidator260 libc-2.28.so [.] 0x00000000000847f4 ▒
0.20% revalidator260 libc-2.28.so [.] 0x00000000000847fc ▒
0.20% revalidator260 libofproto-2.16.so.0.0.2 [.] rule_dpif_lookup_from_table ▒
0.19% revalidator260 libc-2.28.so [.] 0x00000000000847e8 ▒
0.18% revalidator260 libopenvswitch-2.16.so.0.0.2 [.] mf_set_flow_value ▒
0.17% revalidator260 libopenvswitch-2.16.so.0.0.2 [.] dp_netdev_flow_to_dpif_flow
The func __aarch64_cas4_relax atomically compares a 32-bit value in memory with an expected value and, if they match, swaps it with a new value—all with relaxed memory ordering.
static inline bool
ovs_refcount_try_ref_rcu(struct ovs_refcount *refcount)
{
unsigned int count;
atomic_read_explicit(&refcount->count, &count, memory_order_relaxed);
do {
if (count == 0) {
return false;
}
} while (!atomic_compare_exchange_weak_explicit(&refcount->count, &count,
count + 1,
memory_order_relaxed,
memory_order_relaxed));
return true;
}
I cann't understand why atomic_compare_exchange_weak_explicit takes a lot of CPU time in aarch64. Some explanations suggest that under the aarch64 architecture, there is a high possibility of cas weak failing. Can we use strong instead of weak?
#define atomic_compare_exchange_weak \
atomic_compare_exchange_strong
#define atomic_compare_exchange_weak_explicit \
atomic_compare_exchange_strong_explicit
Metadata
Metadata
Assignees
Labels
No labels