This fixes panic when trying to run strace(8) from Focal.
Details
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 42085 Build 38973: arc lint + arc unit
Event Timeline
I think this is reasonable but am curious why we get +ve and -ve values, do you know where they originate?
I do not understand this change. Isn't it bsd_to_linux_errno that must handle all that negative FreeBSD error codes? There is no error for EJUSTRETURN at all, BTW.
No, bsd_to_linux_errno() assumes it's always called with correct value, and AFAIK other callers keep to it.
sys/amd64/linux/linux_ptrace.c | ||
---|---|---|
583 | Is this really accurate? Should EJUSTRETURN be reported as an error? If it's really just EJUSTRETURN then perhaps special case that instead? Have you looked to see what Linux returns for the return from execve? |
sys/amd64/linux/linux_ptrace.c | ||
---|---|---|
583 | EJUSTRETURN means that kernel requests the return code to put the current usermode frame (td_frame) into CPU registers without any adjustments. It is kind of 'no error', but of course the state or the registers might be so that for userspace it is an error reported. EJUSTRETURN was primarly invented for sigreturn(2), and then setcontext(2). |
sys/amd64/linux/linux_ptrace.c | ||
---|---|---|
577 | should we set rval in the EJUSTRETURN case? |
sys/amd64/linux/linux_ptrace.c | ||
---|---|---|
577 | Not sure if I understand. We need to set it to _something_, don't we? |
sys/amd64/linux/linux_ptrace.c | ||
---|---|---|
577 | I suspect Ed' point is that the rval value should be just %eax, same as sr_error == 0 case, which makes an argument for joining sr_error == 0 and == EJUSTRETURN cases. Does Linux have anything similar to EJUSTRETURN? Judging from the ABI, where errors are designated as small negative numbers, it probably does not? |
sys/amd64/linux/linux_ptrace.c | ||
---|---|---|
583 | this version makes sense to me (although I don't think it needs to be wrapped on 2 lines) |
So after the second thought, would something like if (sr.sr_error == EJUSTRETURN) { is_error = IS_ERR_VALUE((sr.sr_retval[0]); ...} more appropriate? IS_ERR_VALUE() is the linux thing which is roughly ((x) >= (uintptr_t)-MAX_ERRNO), not sure how it is spelled in Linuxolator.
Not sure if I follow. The return value there is BSD errno, not Linux errno; also, this doesn't seem to be more readable than the current version.
For EJUSTRETURN, you need to determine (from the FreeBSD return frame, according to your statement) if it is error or success return, and then encode the si.exit values. In particular, rval would be either sr_retval[0] if there is no error, or bsd_to_linux_errno() if there is.
EJUSTRETURN means 'take the saved frame content and load it into CPU registers without any changes'. Now compare this with usual syscall return, which puts return values into some registers, set or clear error indicator etc.
But then we would have to actually get to td_frame; sr_retval seems to be only set when sr_error == 0 - and that would also make the whole thing architecture-dependent.
Is there any practical scenario where this can be a problem?
Definitely, this is how EINTR is returned from syscall after a signal handler was run.
I'm not sure if Linux' ptrace API defines it this way. Also, I don't quite understand how it works for FreeBSD binaries, given that kern_ptrace() only fills sr_retval if sr_error == 0. Therefore I'm tempted to commit it as is, particularly given that it fixes a trivially-reproducible panic.
For native FreeBSD we can legitimately expose EJUSTRETURN. This is enough to inform caller that the real state is in the registers file.
For Linux, due to different ABI and absense of EJUSTRETURN, I have no other proposal than to decode the state (to have the code correct).
Okay, now I get it. So, I've looked at strace(1) source code some more, and it contains explicit fallback for cases where the kernel does support PTRACE_GET_SYSCALL_INFO, but for some reason returns PTRACE_SYSCALL_INFO_NONE instead of PTRACE_SYSCALL_INFO_EXIT, so let's just use that.