Differential D47144

nscd step2
AcceptedPublic
Actions

Authored by david_crossfamilyweb.com on Wed, Oct 16, 2:06 AM.

Details

Reviewers

markj

Summary

Fixes for 2 nscd interaction issues:

if we cannot open the connection to the cache server, that is UNAVAILABLE, not NOT_FOUND
when doing a "mp" read (multipart) the nscd server returns -1 on end of stream (and only then). we then pull that in to our local error code where it overlaps with existing -1 error status and we lose the ability to disambiguate a -1 from the remote end indicating end of stream and an error in our code. Fix here is to invert errors from the remote side and use that to determine end of stream. Also fix the return code. if res==0 then it is success, unless we cannot unmarshal, then return that code as is. if res==1 then it is end of stream (and NOFOUND), otherwise it is a system failure (and UNAVAIL).

Note this is the _mp_ cache reader only

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

david_crossfamilyweb.com created this revision.Wed, Oct 16, 2:06 AM

Herald added a subscriber: imp. · View Herald TranscriptWed, Oct 16, 2:06 AM

david_crossfamilyweb.com requested review of this revision.Wed, Oct 16, 2:06 AM

markj added inline comments.Thu, Oct 17, 2:08 PM

lib/libc/net/nscachedcli.c
541	But doesn't this break the realloc loop which calls __cached_mp_read() and expects -2 to mean, "please give me a larger buffer"? I would suggest keeping nscd's error codes, and making -3 the default error code for this function.

david_crossfamilyweb.com added inline comments.Thu, Oct 17, 5:48 PM

lib/libc/net/nscachedcli.c
541	It doesn't. nscd never returns a -2; -2 is used privately within both nscd and libc for the reallocation routines, but when it comes to the protocol nscd always returns the entire data to the socket (ok, this isn't true, there is a bug in nscd where it cannot return elements over 4k, I am working to fix that now, but it is independent of this); or -1 indicating 'end of stream' The -2 for the sizing is later in this function (at line 549-553), but that is internal to this. I debated doing this in nscd but now you are mixing state machines across 2 different applications,and -2 is an internal state. It seems cleaner to have the protocol state be its own state machine, separate from the state machines in libc and nscd and have the shims manage the translation.. For example mixing these you'd then have to 'know' that you cannot send -2 because it will mess up the other side in really weird ways, and/or the receiving side would need to also guard against protecting its internal state machine

markj accepted this revision.Thu, Oct 17, 6:17 PM

markj added inline comments.

lib/libc/net/nscachedcli.c
541	My comment was based on a reading of nscd's on_mp_read_session_read_request_process(), which I believe is where the error code is generated. It first calls cache_mp_read(), which returns -1 if current_item == NULL, which I believe is the "end of stream case", and -2 if the data size is too small for the current item. That error code is written to the connection in on_mp_read_session_read_response_write1(), so I believe it may be returned here. Oh, hmm now, I see, on_mp_read_session_read_request_process() will always allocate enough data for the second call to cache_mp_read(). So -2 can indeed appear "on the wire", but only due to an internal error. I would suggest adding an assert for this case.

This revision is now accepted and ready to land.Thu, Oct 17, 6:17 PM

We shouldn't need it. The only consumer of this function is __ns_mp_cache_read in nscache.clli.
Line 296 we loop until res != -2 (which is an internal state). so when we leave that loop we know we're in some kind of terminal protocol state.

If that is '0' we're success, finalize the result and send it.

If we're NOT 0 then we're some kind of 'other' state. in which case we unconditionally close the stream, kill the read state, and then return NOT_FOUND on the specific end of stream indicator an UNAVAILABLE in all other cases.

I prefer this handling over an assert since it indicates protocol failue that is recoverable vs. SIGABORT the process on NSCD screwing up.

(This is one reason I like keeping protocol and processing states separate, it leads to natural outcomes like this, it wasn't a situation I anticipated, and now that you mention the error in nscd, I definitely want to fix that)

One thing I might want to do is make sure that the returned error code cannot be positive; since IF nscd were to return a positive 2, then it tricks the system into thinking it is the allocation error.