Page MenuHomeFreeBSD

fspacectl(2): Clarifies the return values
ClosedPublic

Authored by khng on Aug 18 2021, 6:34 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 17, 11:21 PM
Unknown Object (File)
Sun, Jan 12, 1:59 PM
Unknown Object (File)
Tue, Jan 7, 2:28 PM
Unknown Object (File)
Dec 27 2024, 3:48 AM
Unknown Object (File)
Dec 23 2024, 6:18 AM
Unknown Object (File)
Dec 20 2024, 12:52 PM
Unknown Object (File)
Dec 11 2024, 3:04 AM
Unknown Object (File)
Dec 1 2024, 6:00 AM
Subscribers

Details

Summary

rmacklem@ spotted two things in the system call:

  • Upon returning from a successful operation, vop_stddeallocate can update rmsr.r_offset to a value greater than file size. This behavior, although being harmless, can be confusing.
  • The EINVAL return value for rqsr.r_offset + rqsr.r_len > OFF_MAX is undocumented.

This commit has the following changes:

  • vop_stddeallocate and shm_deallocate to bound the the affected area further by the file size.
  • The EINVAL case for rqsr.r_offset + rqsr.r_len > OFF_MAX is documented.
  • The fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9)'s return len is explicitly documented the be the value 0, and the return offset is restricted to be the smallest of off + len and current file size suggested by kib@. This semantic allows callers to interact better with potential file size growth after the call.

Sponsored by: The FreeBSD Foundation

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

khng requested review of this revision.Aug 18 2021, 6:34 PM

I do not understand the code change. IMO you should copy out offset always. It is the lower layer that must provide right offset matching len.

This revision is now accepted and ready to land.Aug 18 2021, 6:50 PM
In D31604#712546, @kib wrote:

I do not understand the code change. IMO you should copy out offset always. It is the lower layer that must provide right offset matching len.

To clarify this, here was a discussion between me and rmacklem@: https://lists.freebsd.org/archives/freebsd-fs/2021-August/000343.html .

In D31604#712579, @khng wrote:
In D31604#712546, @kib wrote:

I do not understand the code change. IMO you should copy out offset always. It is the lower layer that must provide right offset matching len.

To clarify this, here was a discussion between me and rmacklem@: https://lists.freebsd.org/archives/freebsd-fs/2021-August/000343.html .

I read fs@ of course. My point is that there is a bug manifesting as r_offset advancing too much. You are covering it instead of fixing, also making interface unnecessarily ugly.

In vop_stddeallocate, bound the affected area to be within file size.

This revision now requires review to proceed.Aug 18 2021, 7:57 PM
khng planned changes to this revision.Aug 18 2021, 7:57 PM
khng edited the summary of this revision. (Show Details)

vop_stddeallocate: Bound the incoming length to be within the current file size.

sys/kern/vfs_default.c
1186–1191

Is it possible that you get len < 0 during the loop execition? I think not, but anyway IMO it is better to change the len calculation as you did but not add the len <= check, and keep loop condition alone. Then do if (len < 0) len = 0; right after the loop, at out: label. Perhaps add one-line comment that this handles the case of offset behind EOF.

khng marked an inline comment as done.Aug 18 2021, 9:32 PM
kib added inline comments.
sys/kern/vfs_default.c
1184

Unrelated to the review.

I do not think this check and break under it does anything useful for the case when vn_deallocate_impl() is called without IO_NODELOCKED. It is probably reverse, the check and premature return would eat more CPU then.

This revision is now accepted and ready to land.Aug 18 2021, 10:16 PM
sys/kern/vfs_default.c
1147

Won't this break when offset > va.va_size?

Don't you need something like

if (va.va_size > (uint64_t)offset)
      len = omin(va.va_size - offset, *ap->a_len);
else
      len = 0;

I was hoping to have a discussion about the semantics of returned
r_offset when done (r_len == 0) over on freebsd-fs@, but since we
are here.

I don't have a strong opinion on what the correct reply is.

  • I did note that returning with r_offset > file size was a little weird.
  • I also think that setting r_offset = file size when the input r_offset > file size is also a little weird.

Another possible semantic would be to return r_offset == 0
when returned r_len == 0 (ie. when done)?

khng marked an inline comment as done.Aug 19 2021, 6:53 AM
khng added inline comments.
sys/kern/vfs_default.c
1147

when offset > va.va_size, len goes negative and would be handled after the loop.

sys/kern/vfs_default.c
1147

Righto. I missed the change you have at line#1187-1189.

I was hoping to have a discussion about the semantics of returned
r_offset when done (r_len == 0) over on freebsd-fs@, but since we
are here.

I don't have a strong opinion on what the correct reply is.

  • I did note that returning with r_offset > file size was a little weird.
  • I also think that setting r_offset = file size when the input r_offset > file size is also a little weird.

Another possible semantic would be to return r_offset == 0
when returned r_len == 0 (ie. when done)?

IMO the main priority there is to make consumer able to non-ambiguously determine what parts of the file changed. In particular, if nothing changed, it should be clearly indicated as well.

Right now the result range is defined in the man page as 'is updated to contain the unprocessed operation range after the system call returns'. From this interpretation, keeping offset intact but len set to zero is fine.

Regardless of the outcome of this conversation, I believe that result range should be explicitly documented, with precise explanation of the returned values.

I think Kostik makes a good point, in that the results should
indicate what range was zeroed.

The case where r_offset >= file size when the
call is done is essentially a "no-op" and just returns
with r_len == 0.
--> I think the current code in main gets it right by just

leaving r_offset unchanged, which indicates no bytes
were zeroed.

The case where r_offset < file size but r_offset + r_len > file size
zeros to EOF, setting r_len == 0.
--> I think the current code in main get this right too, by

returning r_offset == file size indicating how many bytes were
zeroed.

For NFS, the file's size can change at any time (by another client
doing a write/truncate), but the RPC will reply with the file's size
immediately after the Deallocate, which can be used to set r_offset
upon success to be compatible with the above.

Maybe you could post on freebsd-fs@ with your preferred
semantics and see if others have an opinion.
At this point my preferred semantics would be the above,
which is what is already in main.

The man page changes look ok to me, but I agree with Kostik
that what is returned needs to be documented.

Maybe you could post on freebsd-fs@ with your preferred
semantics and see if others have an opinion.
At this point my preferred semantics would be the above,
which is what is already in main.

The man page changes look ok to me, but I agree with Kostik
that what is returned needs to be documented.

Just posted it on freebsd-fs@. My opinion is to document the case that for r_len == 0 (i.e. a complete and successful operation), if rqsr.r_offset is greater than EOF then it is left intact and copied into rmsr.r_offset as well; if rqsr.r_offset is within EOF then it is not going to be set to a value beyond EOF, so that it caters the OpenZFS case as well. (For OpenZFS, zfs_freesp works by deallocating from the ending part all the way to the starting part of the specified free range.). In that case all implementations so far is going to fit into the bracket.

EDIT: The another approach I talked about is to define the result of rmsr.r_offset in a way that allows callers to know the effective range that the system call processed in case rmsr.r_len == 0. Callers can exactly know the number of bytes being handled by comparing rqsr.r_offset and rmsr.r_offset. By subtracting rqsr.r_offset from rmsr.r_offset, callers know what part of the processed range is within EOF boundaries. If rmsr.r_offset - rqsr.r_offset is zero that indicates the rqsr range is totally beyond EOF, otherwise if rmsr.r_offset - rqsr.r_offset != rqsr.r_len parts of the operation falls outside of EOF. Both approaches are compatible with the current interpretation regarding unprocessed range in the manual page.

  • Adopt the semantic suggested by kib@, which is compatible with the current documentation for callers.
    • The semantic is documented in fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9) explicitly.
  • shm_deallocate is fixed in the same way as vop_stddeallocate, plus return the off and len in all cases.
This revision now requires review to proceed.Aug 21 2021, 4:57 PM
  • Spotted some fspacectl.2 inconsistencies.

offset -= len -> offset += len in vop_stddeallocate.

offset -= len -> offset += len in shm_deallocate.

sys/kern/uipc_shm.c
1952

Is it possible that len < 0 there? I am more about the correction of off in the line below, I do not quite understand why is it needed.

sys/kern/vfs_default.c
1147

I strongly suggest casting va_size to off_t.

khng marked 2 inline comments as done.
  • Move the SHM size checking block to beginning
  • suggested va_size casting
sys/kern/uipc_shm.c
1952

I moved the block to above to make it clearer. The purpose is to convert the operation to a no-op if offset passed to the call is beyond SHM size.

Oops I picked up a stale patch smh.

sys/kern/uipc_shm.c
1907–1909

off_t is signed, so overflow is undefined. You cannot check/assert for overflow this way

khng marked an inline comment as done.

uipc_shm: Make the checking more obvious.

sys/kern/uipc_shm.c
1907–1909

off is vm_ooffset_t type which is unsigned, only len is signed. So the addition arithmetic promotes len to unsigned, which is not a problem here. Regardless, I changed len to unsigned and cast it to off_t when checking if the operation happens completely beyond the EOF.

I suggest you to push the code change now but work some more on the man page clarifications.

lib/libc/sys/fspacectl.2
71

I am not sure that I understand what the 'successful completion without an unprocessed part' is. I want to see a clear explanation of the following:

  • how to find which part of file was zeroed
  • for corner cases, explanation how they are handled.

For instance, is it true that r_offset is always in range of file size? If yes, then together with the specification which part of file is cleared ([rmsr->r_offset; rqsr->r_offset]? is this true) it would be much earier to understand.

This revision is now accepted and ready to land.Aug 24 2021, 1:27 AM
lib/libc/sys/fspacectl.2
71

rqsr.r_offset need not to be in range of file size. while for rmsr.r_offset it is omin(rqsr.r_offset + rqsr.r_len, file_sz), so the returned r_offset is never beyond the EOF offset. So yes, the range to be cleared is [rmsr->r_offset; rqsr->r_offset).

For unprocessed part what I mean is if the call is interrupted due to internal error/pending signal.

This revision was automatically updated to reflect the committed changes.