Page MenuHomeFreeBSD

linux: Make PTRACE_GET_SYSCALL_INFO handle negative errnos
ClosedPublic

Authored by trasz on Oct 7 2021, 1:53 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jan 22, 11:17 AM
Unknown Object (File)
Fri, Jan 17, 10:21 PM
Unknown Object (File)
Wed, Jan 15, 7:38 PM
Unknown Object (File)
Tue, Jan 14, 10:50 PM
Unknown Object (File)
Mon, Jan 13, 11:15 PM
Unknown Object (File)
Mon, Jan 13, 11:00 PM
Unknown Object (File)
Mon, Jan 13, 10:48 PM
Unknown Object (File)
Fri, Jan 10, 11:16 PM
Subscribers

Details

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 42318
Build 39206: arc lint + arc unit

Event Timeline

trasz requested review of this revision.Oct 7 2021, 1:53 PM

I think this is reasonable but am curious why we get +ve and -ve values, do you know where they originate?

I think this is reasonable but am curious why we get +ve and -ve values, do you know where they originate?

I think it's EJUSTRETURN after exec.

I do not understand this change. Isn't it bsd_to_linux_errno that must handle all that negative FreeBSD error codes? There is no error for EJUSTRETURN at all, BTW.

No, bsd_to_linux_errno() assumes it's always called with correct value, and AFAIK other callers keep to it.

sys/amd64/linux/linux_ptrace.c
627

Is this really accurate? Should EJUSTRETURN be reported as an error? If it's really just EJUSTRETURN then perhaps special case that instead?

Have you looked to see what Linux returns for the return from execve?

sys/amd64/linux/linux_ptrace.c
627

EJUSTRETURN means that kernel requests the return code to put the current usermode frame (td_frame) into CPU registers without any adjustments. It is kind of 'no error', but of course the state or the registers might be so that for userspace it is an error reported.

EJUSTRETURN was primarly invented for sigreturn(2), and then setcontext(2).

Only expect EJUSTRETURN; don't claim it's an error. This one makes strace happy.

sys/amd64/linux/linux_ptrace.c
615

should we set rval in the EJUSTRETURN case?

sys/amd64/linux/linux_ptrace.c
615

Not sure if I understand. We need to set it to _something_, don't we?

sys/amd64/linux/linux_ptrace.c
615

I suspect Ed' point is that the rval value should be just %eax, same as sr_error == 0 case, which makes an argument for joining sr_error == 0 and == EJUSTRETURN cases.

Does Linux have anything similar to EJUSTRETURN? Judging from the ABI, where errors are designated as small negative numbers, it probably does not?

Join the two conditionals.

sys/amd64/linux/linux_ptrace.c
627

this version makes sense to me (although I don't think it needs to be wrapped on 2 lines)

So after the second thought, would something like if (sr.sr_error == EJUSTRETURN) { is_error = IS_ERR_VALUE((sr.sr_retval[0]); ...} more appropriate? IS_ERR_VALUE() is the linux thing which is roughly ((x) >= (uintptr_t)-MAX_ERRNO), not sure how it is spelled in Linuxolator.

Remove useless line wrapping.

In D32355#733679, @kib wrote:

So after the second thought, would something like if (sr.sr_error == EJUSTRETURN) { is_error = IS_ERR_VALUE((sr.sr_retval[0]); ...} more appropriate? IS_ERR_VALUE() is the linux thing which is roughly ((x) >= (uintptr_t)-MAX_ERRNO), not sure how it is spelled in Linuxolator.

Not sure if I follow. The return value there is BSD errno, not Linux errno; also, this doesn't seem to be more readable than the current version.

In D32355#733679, @kib wrote:

So after the second thought, would something like if (sr.sr_error == EJUSTRETURN) { is_error = IS_ERR_VALUE((sr.sr_retval[0]); ...} more appropriate? IS_ERR_VALUE() is the linux thing which is roughly ((x) >= (uintptr_t)-MAX_ERRNO), not sure how it is spelled in Linuxolator.

Not sure if I follow. The return value there is BSD errno, not Linux errno; also, this doesn't seem to be more readable than the current version.

For EJUSTRETURN, you need to determine (from the FreeBSD return frame, according to your statement) if it is error or success return, and then encode the si.exit values. In particular, rval would be either sr_retval[0] if there is no error, or bsd_to_linux_errno() if there is.

In D32355#734010, @kib wrote:
In D32355#733679, @kib wrote:

So after the second thought, would something like if (sr.sr_error == EJUSTRETURN) { is_error = IS_ERR_VALUE((sr.sr_retval[0]); ...} more appropriate? IS_ERR_VALUE() is the linux thing which is roughly ((x) >= (uintptr_t)-MAX_ERRNO), not sure how it is spelled in Linuxolator.

Not sure if I follow. The return value there is BSD errno, not Linux errno; also, this doesn't seem to be more readable than the current version.

For EJUSTRETURN, you need to determine (from the FreeBSD return frame, according to your statement) if it is error or success return, and then encode the si.exit values. In particular, rval would be either sr_retval[0] if there is no error, or bsd_to_linux_errno() if there is.

Isn't EJUSTRETURN always a success?

Isn't EJUSTRETURN always a success?

EJUSTRETURN means 'take the saved frame content and load it into CPU registers without any changes'. Now compare this with usual syscall return, which puts return values into some registers, set or clear error indicator etc.

In D32355#734021, @kib wrote:

Isn't EJUSTRETURN always a success?

EJUSTRETURN means 'take the saved frame content and load it into CPU registers without any changes'. Now compare this with usual syscall return, which puts return values into some registers, set or clear error indicator etc.

But then we would have to actually get to td_frame; sr_retval seems to be only set when sr_error == 0 - and that would also make the whole thing architecture-dependent.

Is there any practical scenario where this can be a problem?

Is there any practical scenario where this can be a problem?

Definitely, this is how EINTR is returned from syscall after a signal handler was run.

In D32355#734342, @kib wrote:

Is there any practical scenario where this can be a problem?

Definitely, this is how EINTR is returned from syscall after a signal handler was run.

I'm not sure if Linux' ptrace API defines it this way. Also, I don't quite understand how it works for FreeBSD binaries, given that kern_ptrace() only fills sr_retval if sr_error == 0. Therefore I'm tempted to commit it as is, particularly given that it fixes a trivially-reproducible panic.

In D32355#734342, @kib wrote:

Is there any practical scenario where this can be a problem?

Definitely, this is how EINTR is returned from syscall after a signal handler was run.

I'm not sure if Linux' ptrace API defines it this way. Also, I don't quite understand how it works for FreeBSD binaries, given that kern_ptrace() only fills sr_retval if sr_error == 0. Therefore I'm tempted to commit it as is, particularly given that it fixes a trivially-reproducible panic.

For native FreeBSD we can legitimately expose EJUSTRETURN. This is enough to inform caller that the real state is in the registers file.

For Linux, due to different ABI and absense of EJUSTRETURN, I have no other proposal than to decode the state (to have the code correct).

Let the ptracer fallback to another method.

In D32355#736013, @kib wrote:
In D32355#734342, @kib wrote:

Is there any practical scenario where this can be a problem?

Definitely, this is how EINTR is returned from syscall after a signal handler was run.

I'm not sure if Linux' ptrace API defines it this way. Also, I don't quite understand how it works for FreeBSD binaries, given that kern_ptrace() only fills sr_retval if sr_error == 0. Therefore I'm tempted to commit it as is, particularly given that it fixes a trivially-reproducible panic.

For native FreeBSD we can legitimately expose EJUSTRETURN. This is enough to inform caller that the real state is in the registers file.

For Linux, due to different ABI and absense of EJUSTRETURN, I have no other proposal than to decode the state (to have the code correct).

Okay, now I get it. So, I've looked at strace(1) source code some more, and it contains explicit fallback for cases where the kernel does support PTRACE_GET_SYSCALL_INFO, but for some reason returns PTRACE_SYSCALL_INFO_NONE instead of PTRACE_SYSCALL_INFO_EXIT, so let's just use that.

This revision is now accepted and ready to land.Oct 23 2021, 4:44 PM