Page MenuHomeFreeBSD

Multiple fixes to the NSS caching system
Needs ReviewPublic

Authored by david_crossfamilyweb.com on Jan 13 2023, 11:21 PM.
Referenced Files
F102587789: D38047.diff
Thu, Nov 14, 11:02 AM
Unknown Object (File)
Tue, Nov 12, 4:18 PM
Unknown Object (File)
Sat, Nov 9, 2:52 PM
Unknown Object (File)
Mon, Nov 4, 8:27 PM
Unknown Object (File)
Sun, Nov 3, 1:08 AM
Unknown Object (File)
Wed, Oct 30, 7:42 PM
Unknown Object (File)
Mon, Oct 28, 5:08 PM
Unknown Object (File)
Mon, Oct 28, 3:10 PM

Details

Reviewers
None
Group Reviewers
Src Committers
Summary

This addresses multiple issues in the NSS caching system, the hilights are:

  1. the caching loop prevention doesn't work, executables must be compiled -export-dynamic to have their symbols visible to dlsym() a. As a result of this not even working currently, take this opportunity to rename the symbol correctly.
  2. getgroupmembership() isn't cached at all if a NSS provider (ie LDAP) provides its own version, since that gets intercepted before the fallback is executed
  3. passwd and group lookups in nscd are not threadsafe a. I started to fix getservent as well before discovering this set uses thread-local-storage to fix it, I briefly debated migrating that to the passwd and group entries but feel this is a superior solution.
  4. negative cached lookups when using perform_lookups were not properly returned.
Test Plan

Tested quite extensively (as in I am actively using this) with nscd against nss_ldap with perform-actual-lookups for passwd and group. Should be tested with that off to ensure that I got the marshaling code correct in libc (aside, I debated unifying the marshaling/unmarshaling code with libc, but it would require a bit more of a rewrite as they use the code slightly differently)

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

Does this need any corresponding updates to manual pages, eg getpwent(3), getgrent(3), nsswitch.conf(5), or nscd(8)? My C reading skills aren't up to telling conclusively.

In D38047#865895, @pauamma wrote:

Does this need any corresponding updates to manual pages, eg getpwent(3), getgrent(3), nsswitch.conf(5), or nscd(8)? My C reading skills aren't up to telling conclusively.

Good question, I would say no, this does not affect WHAT is cached (in terms of databases), it fixes an oversight in the group database implementation and increases thread safety, there are no behavioral changes except for the oversight, which I would argue fixes a bug in incomplete group database handling,

Can you please run the libc pw* tests?

kyua test -k /usr/tests/lib/libc/nss/Kyuafile

Please see man 7 tests for more details.

lib/libc/gen/getgrent.c
245

The ellipses can be removed around the individual conditionals.

lib/libc/net/nsdispatch.c
133

It seems like this could break external compatibility with this symbol. Is there a reason why this should be renamed?

usr.sbin/nscd/agents/group.c
34–39

Could you please sort #include's?

73

(picking an instance)
style(9) says there should be a blank line before return.

98

The ellipses can be removed around the individual conditionals.

214

= vs space.

234–235

Can this be condensed to a single line?

242

Spurious space after memcpy.

243

= vs space.

245

Spurious space after memcpy.

378

This seems like it could result in a double free if the loop terminates and it hits line 392 or 401. Could you please defensively set group_storage = NULL; to avoid a potential future double free?

384

style(9): need a space between = and other portions of the line statement.

usr.sbin/nscd/agents/passwd.c
38

Please sort per style(9).

70–73

Is this indented correctly?

83

I have similar concerns about a double-free here as I did above.

260

style(9): missing space before/after =

270–272

style(9): line 271/272/300/301 should be indented with a hard tab and 4 spaces.

289

This should replace *result on line 208.

usr.sbin/nscd/query.c
806–813

Can this code be put on a single line and stay under 80 columns or be reindented similar to what I noted above about style(9).

pstef added inline comments.
usr.sbin/nscd/agents/group.c
73

Can you quote exactly? I can't find it. There is a mention of an optional blank "line at the beginning of functions with no local variables", but that's been optional for quite some time.

Please provide more diff context (diff -U999999).

usr.sbin/nscd/agents/group.c
92

size_t *buffer_size

lib/libc/net/nsdispatch.c
133

It could, however in this case because of the bug referenced in the Makefile (-export-dynamic), this check *never* worked for nscd. If there is some hypothetical other provider for the caching interface in libc for nameservice caching that was compiled correctly it could theoretically break that, however the caching system is relatively hardcoded for nscd, the socket_path is set to a #defined value of /var/run/nscd, so said hypothetical daemon would need to ninja its way deep into the intrenals of libc. Additionally the marshaling and unmarshaling functions would need to be in sync, in which case this entire change (or ANY change) to the nscd system could break things since it wouldn't understand new databases or types; because of this I would regard the NSCD interface as a private interface and we have a high degree of freedom to changea and improve it,

I'd like to at least make the name consistent with its purpose. Ideally, and eventually, I'd like to just make this a call into an internal/private method in libc to set a static variable to set behavior vs this kinda gross dynamic symbol reflection

usr.sbin/nscd/agents/group.c
234–235

This would cause the line to be just over 80 columns and wrap.

usr.sbin/nscd/agents/passwd.c
70–73

it appears to be on my terminal (using 8 space tabs everything lines up)

289

I don't follow this, there is only one occurance of *result in agents/passwd.c, (about line 71), and that function doesn't call this one?

usr.sbin/nscd/query.c
806–813

definitely not, the ; at the end of that line is already past 80 columns by a character.

Ran entire kyua test suite as follows:

$ kyua test -k /usr/tests/lib/libc/nss/Kyuafile
Results file id is usr_tests_lib_libc_nss.20230804-045215-435279
Results saved to /usr/home/david/.kyua/store/results.usr_tests_lib_libc_nss.20230804-045215-435279.db

96/96 passed (0 failed)

The .db file is 11Meg; I can attach as needed.

Cleaned up erroneous patches due to my tree being out of date with target.

*bump*

I believe I have addressed all of reviews; it would be great to get this in for 14.1 so I don't keep having to patch my local copies

Any update on getting this committed? Might this make it in for 15-release?

Rebase and remove unnecessary () in an if check (per review comment)

*bump*

Possible to get some eyes on this and see if we can get this across?

arrowd added inline comments.
usr.sbin/nscd/Makefile
13

typo dlsym

I think the cycle detection patch is fine and should be committed on its own. If you send me (markj@freebsd.org) a patch with that piece of the change, I'll commit it.

lib/libc/gen/getgrent.c
371
377
usr.sbin/nscd/agents/group.c
137

Please avoid adding spurious whitespace changes.

usr.sbin/nscd/agents/passwd.c
339

Please avoid adding spurious whitespace changes.

usr.sbin/nscd/query.c
809

Could you please explain in more detail why this change is correct? It's hard for me to see where this response gets handled. I spent a bit of time staring at libc's nscache.c, but it's still not clear to me.

lib/libc/gen/getgrent.c
370

Don't we only need to check the buffer against the number of groups that we'll actually return? i.e., MIN(original_ngroups, maxgroups) as in the memcpy() below.

usr.sbin/nscd/agents/group.c
361

Isn't the gids buffer getting leaked here? If errno != 0, then I don't expect that free(gids) call in the loop tail to be executed.

usr.sbin/nscd/agents/passwd.c
282

There is some code after the getpw() calls which might modify errno (e.g., free(), or whatever happens in TRACE_OUT()). It would be better to save the errno value in a local variable immediately after the getpw() call, and use that here instead of errno.

lib/libc/gen/getgrent.c
370

I am not sure I am following, so let me explain the intent of the code.

The marshalling routine formats the RPC response as follows, a integer length field (the number of gid_t's in the array) followed by an list of gid_t.

What gets passed into the unmarshaling routine is the raw buffer from the protocol read, and the buffer size that was allocated to read the data; it is POSSIBLE that the number of gid_t's listed in the lenght field would be LONGER than the buffer given to the demarshaling, if so this is a protocol error, else we would be copying who knows what memory into the buffer, also it means there would be unread data hanging out on a buffer or socket somewhere messing up protocol state machine. Given this is a protocol error (it really shouldn't happen, the rest of the protocol code should be ensuring that we get an entire message) I return this as an NS_UNAVAIL to indicate the cache server is down (shouldn't treat a miss here as a NOTFOUND)

So what this code does is first grab the integer off of the marhsalled object and then calculates that the buffer is at least as long as an integer + the number of expected gid_t's * sizeof(gid_t).

After it has done this it copies all of the gid_t's that will fit into the allocated space and returns.

So the first check is about ensuring protocol and memory integrity, not that it will actually fit.

usr.sbin/nscd/agents/group.c
361

correct, it won't get executed, however if it doesn't get executed it means that the short circuit evaluation of the logic was false and the ENTIRE loop terminates; ie. IF the while is true, it MUST free, otherwise the loop terminates. I do see that I should free(gids) in the errno !=0 case however (line 362). I will add that.

Otherwise gids is unconditionally free()d on line 368 below

usr.sbin/nscd/agents/passwd.c
282

ugh. good call on this; I created sub-errno's in other contexts, I should have here as well; also adding to the groups lookup database. Likely this needs to be added all over this code.

usr.sbin/nscd/query.c
809

This is to mimic the negative response message from above. See line 746. If there is a negative hit, it pulls the negative result from the cache and sends it. Well, what IS a negative result? we see on line 793 what has just been added to the negative entry. So if it was ALREADY there, it would have sent that back (as on line 746).

More generally, if this isn't there the client just hangs; my guess is that given an error code of 0 it expects there to be SOME data there and it just waits forever for that next bit of data that is not coming.

If you look at the unmarshal functions they have checks for this that translates it to NS_NOTFOUND. I am honestly not a big fan of doing it this way, but it was took big a (breaking) change to take on.

Interim PR changes while I work on splitting this up to different sub PRs. Don't want my work lost

usr.sbin/nscd/agents/group.c
361

Right, this leak is what I was referring to.

More generally, I think having free() in the loop condition like this is a bit too clever and would prefer to avoid that.

usr.sbin/nscd/agents/group.c
361

Too clever; agreed. In fact I thought the same thing when I re-read it, and was sure I had to have copied it from somewhere else; but cannot find where.

also getgrouplist actually reports back the needed size, no need for the += NGROUPS like that, so I am re-working this entire loop.

rebasing with minor additional changes before I begin the work to split this up