Page MenuHomeFreeBSD

mount_nfs: make temporary DNS failure non-fatal with background mode
ClosedPublic

Authored by glebius on Feb 26 2025, 11:08 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Mar 31, 1:00 PM
Unknown Object (File)
Fri, Mar 21, 2:32 AM
Unknown Object (File)
Fri, Mar 14, 6:06 PM
Unknown Object (File)
Mon, Mar 10, 12:33 PM
Unknown Object (File)
Thu, Mar 6, 7:27 PM
Unknown Object (File)
Thu, Mar 6, 10:43 AM
Unknown Object (File)
Mar 3 2025, 11:26 AM
Unknown Object (File)
Mar 1 2025, 2:29 PM
Subscribers

Details

Summary

Typical problem with network mounts is remote equipment not being
available when our host boots up after a power failure. Even if you
properly configure boot order of all local services and wait for link
coming up on your NIC, you still may boot faster than some intermediate
switch on the network or the DNS server itself. Let's refer to this as a
"server room boot race". For NFS mounts with hostname in hosts(5) the
race is addressed by a retry loop on NFS mount timeout. However, a DNS
resolution timeout is treated differently to NFS mount timeout. We fail
on the former and keep retrying on the latter.

With feedback received on current@, I see that the problem is so old, that
people got used to it and see it as a desired behavior rather than a
problem. And for those who is affected by the problem, they suggest
hosts(5) as a solution. Note that using hosts(5) isn't scalable, and
using bare IP addresses is neither scalable, nor compatible with
Kerberized mounts.

A trade-off solution would be to enable the retry cycle over DNS timeouts
only when background mode is specified, which is a typical use in fstab(5)
and very uncommon in a command line. That would address the server room
boot race problem without breaking POLA for command line.

One remaining problem is that our resolver doesn't differentiate between
negative answer from DNS server and a timeout. Once resolver is fixed we
should check only EAI_AGAIN error and exit retry cycle on EAI_NONAME.
Until this is fixed, we have a bug that a mistyped hostname in your
fstab(5) would end in endless attempts to resolve it. This is a lesser
evil than a host with a correctly written hostname in fstab(5) coming up
without necessary mounts due to the boot race. And the resolver is going
to be fixed eventually.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

I see no problem with this, in general.

Looks ok to me. You can decide if retries make sense for
ecode == EAI_NONAME.

sbin/mount_nfs/mount_nfs.c
662

Btw, when I tried an invalid name while DNS
was working correctly, the ecode was EAI_NONAME.

It might not be worth retrying it for that case, but
I'll leave it up to you.

This revision is now accepted and ready to land.Feb 27 2025, 11:39 AM

This revision is rebased on top of https://reviews.freebsd.org/D49411.

Now EAI_NONAME is ignored and only EAI_AGAIN causes retries.

This revision now requires review to proceed.Wed, Mar 19, 12:00 AM
This revision is now accepted and ready to land.Thu, Mar 20, 1:45 AM