Page MenuHomeFreeBSD

Add tests for ses(4)
ClosedPublic

Authored by asomers on Sep 2 2021, 8:51 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sep 13 2024, 2:35 AM
Unknown Object (File)
Sep 9 2024, 6:49 PM
Unknown Object (File)
Sep 4 2024, 8:28 AM
Unknown Object (File)
Aug 19 2024, 7:01 AM
Unknown Object (File)
Aug 18 2024, 11:19 AM
Unknown Object (File)
Aug 17 2024, 9:05 PM
Unknown Object (File)
Aug 10 2024, 2:03 PM
Unknown Object (File)
Aug 10 2024, 2:02 PM
Subscribers

Details

Reviewers
ken
mav
Group Reviewers
cam
Commits
rGeea7c61590ae: Add tests for ses(4)
Summary

Add tests for ses(4)

WIP adding tests for ses(4)

WIP on a setelmstat test

"Fix" cleanup for the setelmstat test

sg_ses doesn't provide any kind of dump-and-restore functionality,
contrary to my initial assumption. So during cleanup, forget about the
initial state and just clear the ident bits

Rename "smoke" to "nondestructive" and cleanup old files

Test Plan

Tests cases added. They require ses hardware.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 41337
Build 38226: arc lint + arc unit

Event Timeline

Revert unintended changes to contrib/atf

I'm assuming this will need to run on a system with a SES device.

That's fine. Someone might write one for CTL someday, but that would be too much work for just a test rig.

This revision is now accepted and ready to land.Jan 19 2022, 6:33 PM
In D31809#767688, @ken wrote:

I'm assuming this will need to run on a system with a SES device.

That's fine. Someone might write one for CTL someday, but that would be too much work for just a test rig.

Yeah, the tests will be skipped if no SES device is present. And unfortunately, the tests will fail if a buggy SES device is present. I don't have a good solution to that, other than to write a virtual SES device in bhyve just for testing purposes.

This revision was automatically updated to reflect the committed changes.
tests/sys/ses/destructive.c
69

I am not a "style nazi", but why all one line? Some ATF style?

106

See my recent commit 2e19fae49fd. SES is weird. You can't write status into control on arbitrary hardware.

141

This loop is weird. You are reading element status twice before checking its status first time.

Also why do you need the delay? Are there enclosures where ident is not getting back immediately? If so, then read-modify-write attempts are even more hopeless than coming from SES specification.

If the ident got set, does it make sense to read it more? Is it intentional?

155

It is practically impossible to restore state of some bits, that is probably there is no "dump and restore". See my recent commit 2e19fae49fd. SES is weird.

170

Why neither error handling nor assertion? And neither on line 190.

238

Should I implement it for AHCI virtual SES, or what SES hardware this will be run on more often?

286

Why TODO, considering setelmstat?

tests/sys/ses/nondestructive.c
166

You should not compare so straight without masking. There are at least 3 other non-reserved bits.

197

I haven't seen NVMe SES so far and may be wrong, but instead of "nvme" and probably "nvd" I'd expect to see "nda" here.

asomers added inline comments.
tests/sys/ses/destructive.c
69

I'll change it.

106

Ahh, that change crossed this one in review. So I should be good if I run cstat[0] through ses_status_to_ctrl?

141

Yeah, I'll remove the first ioctl. As for the loop, I don't recall if I found a SES device that responds slowly, it's been too long since I wrote this test. But there's another reason: I have a WIP change, not yet committed, that makes ENCIOC_SETELMSTAT asynchronous. There are two reasons for that:

  • Using ENCIOC_SETELMSTAT for each element in a large array is very slow, because it can result in a separate SCSI SEND DIAGNOSTIC command for each element. Making the ioctl asynchronous allows the driver to batch them together, sending far fewer SCSI commands.
  • There is a race condition easily triggered by calling ENCIOC_SETELMSTAT back to back on large enclosures, that results in a deadlock. Asynchronous ENCIOC_SETELMSTAT is part of the solution for that race.
170

I'll add it.

238

What is AHCI virtual SES? Do you mean SGPIO? I don't think SGPIO is good enough to be of any use for almost anybody. It was my expectation that these tests, when they run at all, would run with regular SAS SES controllers.

286

Why TODO, considering setelmstat?

Oops, I'll remove it.

tests/sys/ses/nondestructive.c
166

Ok, I'll add a mask. BTW, sesutil also compares without a mask.

197

I haven't seen any either. I'll add an "nda" check.

tests/sys/ses/nondestructive.c
197

There is an NVMe-MI (Management Interface) specification at nvmexpress.org

It borrows a lot from SES, and explicitly references SES.

I have not yet seen any hardware that implements it. The lack of enclosure management has meant that we've had to hack in some fake enclosure support with hard-coded slot mapping so our GUI can show where drives are.

tests/sys/ses/destructive.c
106

As good as possible. I've found there few bits that just can't be read, but has to be set.

141

I was never happy with present asynchronous reads. Old data they returned were confusing few times. At very least I think it should be redone in some way, may be limiting number of concurrent requests or maximum request rate, but it should not have periodic polling every X seconds.

Asynchronous writes are even more confusing. At very least all asynchronous writes should be allowed to complete before any new reads are allowed, otherwise we'll get response that may not match real hardware one. Present code always forces read-bad after every write for that purpose.

Also some control bits have command semantics, like "do action X", not "set bit to X". Such bits are already weird across read-modify-write, but may become even more weird in case of delayed writes.

238

Yes. While SGPIO is indeed very simplistic, it allows to control few LEDs per drive and present in almost Intel chipset.

tests/sys/ses/nondestructive.c
166

I mean ses_status_common_get_element_status_code() or whatever it is after all those macros unrolled...

197

Yes, that is what we are doing also now.