Page MenuHomeFreeBSD

carp: carp_master_down_locked() requires net epoch
ClosedPublic

Authored by zlei on Mar 16 2023, 8:06 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Oct 30, 10:00 PM
Unknown Object (File)
Mon, Oct 28, 5:06 PM
Unknown Object (File)
Oct 8 2024, 7:04 AM
Unknown Object (File)
Sep 22 2024, 11:48 PM
Unknown Object (File)
Sep 22 2024, 7:10 PM
Unknown Object (File)
Sep 22 2024, 11:58 AM
Unknown Object (File)
Sep 22 2024, 3:58 AM
Unknown Object (File)
Sep 21 2024, 10:57 PM

Details

Summary

Fixes: 1d126e9b9474 carp: Widen epoch coverage
MFC after: 1 day

Test Plan

On current or stable/13 with INVARIANTS, force a virtual host to be master from backup by ifconfig vhid 100 state master.

stable/12 is not affected.

#!/bin/sh

kldload -nq carp

ep=$( ifconfig epair create inet 192.168.1.1/24 )

ifconfig $ep vhid 100 advskew 100 192.168.1.100/32 alias
ifconfig $ep vhid 100 state backup
# The following will triggers assertion panic without the fix
ifconfig $ep vhid 100 state master

sleep 0.5
ifconfig $ep destroy

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

zlei requested review of this revision.Mar 16 2023, 8:06 AM

It'd be good to add this case to the carp tests. Can you? If not, I'll take a look. I'm doing some carp work as well anyway.

This revision is now accepted and ready to land.Mar 16 2023, 8:25 AM
In D39113#890304, @kp wrote:

It'd be good to add this case to the carp tests. Can you? If not, I'll take a look. I'm doing some carp work as well anyway.

I'm not familiar with the atf test framework, and I have no bandwidth. I'm still struggling with route cache framework yet :(

In D39113#890307, @zlei wrote:
In D39113#890304, @kp wrote:

It'd be good to add this case to the carp tests. Can you? If not, I'll take a look. I'm doing some carp work as well anyway.

I'm not familiar with the atf test framework, and I have no bandwidth. I'm still struggling with route cache framework yet :(

Okay, no worries. I'll write up a quick test and copy you on the review.

BTW, without INVARIANTS, I have not encounter any problem even run the test script (as in TEST PLAN) multiple times. Maybe I have good luck, or there're something hide behind net epoch .

carp_master_down_locked() will call ip_output() , arprequest() and nd6_na_output() those all require net epoch. Although the ifp holds a reference but I think without net epoch the system will behave badly, possibly access freed memory.

Can @melifaro comment on this?

In D39113#890320, @zlei wrote:

BTW, without INVARIANTS, I have not encounter any problem even run the test script (as in TEST PLAN) multiple times. Maybe I have good luck, or there're something hide behind net epoch .

carp_master_down_locked() will call ip_output() , arprequest() and nd6_na_output() those all require net epoch. Although the ifp holds a reference but I think without net epoch the system will behave badly, possibly access freed memory.

The epoch system serves to make sure that resources we rely on (such as the ifp, or route or neighbour objects) don't go away while we're using them. It's pretty unlikely for them to actually go away in the tiny slice of time where it'd matter, so it's completely normal that you didn't see any issues even with the missing epoch calls.

In D39113#890320, @zlei wrote:

BTW, without INVARIANTS, I have not encounter any problem even run the test script (as in TEST PLAN) multiple times. Maybe I have good luck, or there're something hide behind net epoch .

carp_master_down_locked() will call ip_output() , arprequest() and nd6_na_output() those all require net epoch. Although the ifp holds a reference but I think without net epoch the system will behave badly, possibly access freed memory.

Can @melifaro comment on this?

Kristof already replied, so just adding a few points here. Epoch provides safety guarantees that the objects we have pointers will not be freed. In a sense, it's somewhat similar to the locks - without them, the concurrent code will work, but not always.
I typically develop/test on the kernels w/ INVARIANTS first, as it really helps to catch issues early - as we've plenty of INVARIANTS-only asserts.

@kp @melifaro
Thanks for the explaining ;)

I typically develop/test on the kernels w/ INVARIANTS first, as it really helps to catch issues early - as we've plenty of INVARIANTS-only asserts.

Is it possible that some tools / compilers can catch customized ASSERTS such as NET_EPOCH_ASSERT and hint / blame unsatisfied conditions?
Although runtime tests are still needed but tests are always not enough .

In D39113#890345, @zlei wrote:

@kp @melifaro
Thanks for the explaining ;)

I typically develop/test on the kernels w/ INVARIANTS first, as it really helps to catch issues early - as we've plenty of INVARIANTS-only asserts.

Is it possible that some tools / compilers can catch customized ASSERTS such as NET_EPOCH_ASSERT and hint / blame unsatisfied conditions?

Coverity could check some of these. But for the rest of the tools it's just writing some (or more) tests.

Although runtime tests are still needed but tests are always not enough .

There are some efforts for reduce the bar for writing kernel unit-tests like https://github.com/rysto32/freebsd/tree/sysunit , but we don't have it fully working yet.