Page MenuHomeFreeBSD

sesutil: fix "fault all" with zoned jbods
ClosedPublic

Authored by asomers on Mar 10 2023, 5:48 PM.
Tags
None
Referenced Files
Unknown Object (File)
Nov 29 2024, 1:59 AM
Unknown Object (File)
Nov 18 2024, 2:15 AM
Unknown Object (File)
Nov 5 2024, 3:00 AM
Unknown Object (File)
Nov 5 2024, 2:55 AM
Unknown Object (File)
Nov 5 2024, 2:55 AM
Unknown Object (File)
Nov 5 2024, 2:44 AM
Unknown Object (File)
Oct 16 2024, 3:34 PM
Unknown Object (File)
Oct 16 2024, 3:34 PM
Subscribers

Details

Summary

Some SAS JBODs support zoning. This feature allows individual SAS
targets to be accessible by only some initiator ports. One application
would be connecting two servers to the same JBOD, but they wouldn't be
able to see each other's disks.

A zoned JBOD should also prohibit initiators from accessing SES elements
corresponding to inaccessible SAS targets. It reports that by setting
the element's status code to 0x8 (No Access Allowed).

The bug is that when doing "sesutil (fault|locate) all ...", sesutil
will attempt a ENCIOC_SETELMSTAT ioctl for every single element, even
the inaccessible ones. The enclosure will reject the command, the
kernel will return EINVAL, and sesutil will exit.

The solution is to check the element's status, and skip trying to set it
if the status is 0x8. But if the user actually supplied a ses ID, then
assume that he knows what he's doing and try to set it anyway.

PR: 270093
MFC after: 1 week
Sponsored by: Axcient

Test Plan

manually tested with zoned and unzoned jbods

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 50279
Build 47171: arc lint + arc unit

Event Timeline

It may be OK, but I am not sure I like this approach. It creates additional kernel requests, it is potentially racy. Doesn't the enclosure return some reasonable status that kernel could convert into EACCESS or something more reasonable than EINVAL? At least it would be good to do both.

In D39017#888701, @mav wrote:

It may be OK, but I am not sure I like this approach. It creates additional kernel requests, it is potentially racy. Doesn't the enclosure return some reasonable status that kernel could convert into EACCESS or something more reasonable than EINVAL? At least it would be good to do both.

Unfortunately, the enclosure doesn't return very helpful status. Here's what happens when I issue the command with sg_ses:

Fixed format, current; Sense key: Illegal Request
Additional sense: Invalid field in parameter list

So I don't think I can translate this into something more helpful than EINVAL. Also, I don't want to rely on sesutil trying to set every element and ignoring EACCES, because that would be very slow. And I'm not worried about races, because in all use cases that I know of changing a JBOD's zoning configuration is something that should happen very rarely, no more than once per year. This changes _does_ create additional ioctls, as you pointed out. But I don't see a good way around that. At least, not without a drastic change to the ses(4) API.

@mav given my comments above, are you ok with merging this? Do you have any better ideas for how to fix the problem?

@mav given my comments above, are you ok with merging this? Do you have any better ideas for how to fix the problem?

I can't recall anything from SES specs saying this error should happen or how the error should be handled. Usually writes into not implemented LED bits are just ignored. I don't know why the hardware vendor decided to return error, especially so opaque, but unless there is some better way to handle it (I don't have one) I'll be OK with this, even though not very like it.

This revision is now accepted and ready to land.Mar 27 2023, 7:39 PM

Ok, I'll commit for now. However, this vendor is sometimes responsive. They might be willing to choose a better error type if I ask. Do you have any particular SCSI sense code in mind?

This revision was automatically updated to reflect the committed changes.

Ok, I'll commit for now. However, this vendor is sometimes responsive. They might be willing to choose a better error type if I ask. Do you have any particular SCSI sense code in mind?

As I have told, I'd be OK with quietly ignoring writes. But may be specs could be reviewed closer on the subject of NOACCESS meaning. I don't have anything coming to mind now.