audit: Avoid race caused by starting auditd(8) for testing
ClosedPublic
Actions

Authored by arichardson on Feb 1 2021, 7:17 PM.

Details

Reviewers

asomers

Group Reviewers

Audit

Commits

rGeb52de783a1c: tests/sys/audit: Avoid race caused by starting auditd(8) for testing
rGdf093aa9463b: tests/sys/audit: Avoid race caused by starting auditd(8) for testing

Summary

In the CheriBSD CI we reproducibly see the first test in sys/audit
(administrative:acct_failure) fail due to a missing startup message.
It appears this is caused by a race condition when starting auditd:
service auditd onestart returns as soon as the initial auditd() parent
exits (after the daemon(3) call).
We can avoid this problem by setting up the auditd infrastructure
in-process: libauditd contains audit_quick_{start,stop}() functions that
look like they are ideally suited to this task.
This patch also avoids forking lots of shell processes for each of the 418
tests by using auditon(A_SENDTRIGGER, &trigger, sizeof(trigger)) to check
for a running auditd(8) instead of using service auditd onestatus.

With these two changes (and D28388 to fix the XFAIL'd test) I can now
boot and run cd /usr/tests/sys/audit && kyua test without any failures
in a single-core QEMU instance. Before there would always be at least one
failed test.

Besides making the tests more reliable in CI, a nice side-effect of this
change is that it also significantly speeds up running them by avoiding
lots of fork()/execve() caused by shell scripts:
Running kyua test on an AArch64 QEMU took 315s before and now takes 68s,
so it's roughly 3.5 times faster. This effect is even larger when running
on a CHERI-RISC-V QEMU since emulating CHERI instructions on an x86 host
is noticeably slower than emulating AArch64.

Test Plan

aarch64+amd64 QEMU no longer fail.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

arichardson created this revision.Feb 1 2021, 7:17 PM

Herald added a subscriber: imp. · View Herald TranscriptFeb 1 2021, 7:17 PM

arichardson requested review of this revision.Feb 1 2021, 7:17 PM

Harbormaster completed remote builds in B36660: Diff 83244.Feb 1 2021, 7:17 PM

This looks great! One question: what happens if auditd is already running? Will the tests run successfully? And what's "fake" about the fake auditd?

tests/sys/audit/utils.c
299	If two tests are running in parallel, they might both try to start fake auditd. The only thing that saves you is the fact that the audit tests are currently all marked as is_exclusive.

In D28451#636601, @asomers wrote:

This looks great! One question: what happens if auditd is already running? Will the tests run successfully? And what's "fake" about the fake auditd?

Well, it's not actually a real auditd and doesn't log properly. I'm not sure what else is different. The comment above audit_quick_start() says:

/*
 * Do just what we need to quickly start auditing.  Assume no system logging or
 * notify.  Return:
 *   0	 on success,
 *  -1   on failure.
 */

It also appears to be (completely(?)) in-process, so maybe we don't actually need the cleanup function anymore.

I tried running the tests after running service auditd onestart and they continue to work since the is_auditd_running() returns true and we can use the existing auditd.
However, I'm not sure that is a good idea since at least one test (auditctl_success) changes the running auditd configuration.
Maybe we should be skipping these tests when a running auditd is detected (possibly with an env-var override to test against the running auditd)?

tests/sys/audit/utils.c
299	Ah yes, I keep forgetting that kyua can actually run tests in parallel. I'm not sure what happens if multiple processes run `audit_quick_start`: my guess would be one of them fails. Will try to test this with a SMP QEMU.

Running the tests in parallel (by commenting out the is_exclusive) causes failures as expected (0 != audit_quick_start(): Failed to start fake auditd: File exists).

It also then results in a panic:

root@freebsd-amd64:/usr/tests/sys/audit # kyua -v parallelism=16 test
administrative:adjtime_failure  ->  passed  [0.153s]
administrative:adjtime_success  ->  passed  [0.149s]
administrative:auditctl_failure  ->  failed: /local/scratch/alr48/cheri/freebsd/tests/sys/audit/utils.c:300: 0 != audit_quick_start(): Failed to start fake auditd: File exists  [0.148s]
administrative:auditctl_success  ->  passed  [0.145s]
administrative:auditon_default_failure  ->  passed  [0.057s]
administrative:auditon_default_success  ->  failed: /local/scratch/alr48/cheri/freebsd/tests/sys/audit/utils.c:300: 0 != audit_quick_start(): Failed to start fake auditd: File exists  [0.143s]
panic: audit_syscall_enter: td->td_ar != NULL
cpuid = 0
time = 1612265481
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0093bb59d0
vpanic() at vpanic+0x188/frame 0xfffffe0093bb5a20
panic() at panic+0x43/frame 0xfffffe0093bb5a80
audit_syscall_enter() at audit_syscall_enter+0x1b7/frame 0xfffffe0093bb5ac0
amd64_syscall() at amd64_syscall+0x702/frame 0xfffffe0093bb5bf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0093bb5bf0
--- syscall (3, FreeBSD ELF64, sys_read), rip = 0x8003e556a, rsp = 0x7fffffffca48, rbp = 0x7fffffffca60 ---
KDB: enter: panic
[ thread pid 809 tid 100093 ]
Stopped at      kdb_enter+0x37: movq    $0,0x10a084e(%rip)

arichardson mentioned this in D28462: usr.sbin/praudit: fix tests after 5619d49e07.Feb 2 2021, 3:58 PM

ping? I'm not sure we can make it possible to run tests in parallel, so the is_exclusive needs to stay.

Call atexit() to make it easier to run outside of kyua

Harbormaster completed remote builds in B37053: Diff 84016.Feb 16 2021, 3:38 PM

remove unintended change

Harbormaster completed remote builds in B37054: Diff 84017.Feb 16 2021, 3:39 PM

I think we can commit this. But please open a bug for the panic you discovered. And it might be a good idea to add a comment to the Makefile explaining the necessity for is_exclusive.

This revision is now accepted and ready to land.Feb 18 2021, 12:27 AM

Filed the bugreport as https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253616

Closed by commit rGdf093aa9463b: tests/sys/audit: Avoid race caused by starting auditd(8) for testing (authored by arichardson). · Explain WhyFeb 18 2021, 2:04 PM

This revision was automatically updated to reflect the committed changes.

arichardson added a commit: rGdf093aa9463b: tests/sys/audit: Avoid race caused by starting auditd(8) for testing.

arichardson added a commit: rGeb52de783a1c: tests/sys/audit: Avoid race caused by starting auditd(8) for testing.Mar 17 2021, 10:23 PM