Page MenuHomeFreeBSD

cam: Add CAM_NVME_STATUS_ERROR error code
ClosedPublic

Authored by imp on Jul 19 2023, 4:15 AM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Oct 29, 11:51 AM
Unknown Object (File)
Tue, Oct 22, 11:40 AM
Unknown Object (File)
Mon, Oct 21, 3:42 PM
Unknown Object (File)
Mon, Oct 21, 3:41 PM
Unknown Object (File)
Mon, Oct 21, 3:41 PM
Unknown Object (File)
Mon, Oct 21, 3:20 PM
Unknown Object (File)
Sep 24 2024, 3:39 AM
Unknown Object (File)
Sep 24 2024, 1:01 AM
Subscribers

Details

Summary

Add CAM_NVME_STATUS_ERROR error code. Flag all NVME commands that
completed with an error status as CAM_NVME_STATUS_ERROR (a new value)
instaead of CAM_REQ_CMP_ERR. This indicates to the upper layers of CAM
that the 'cpl' field for nvmeio CCBs is valid and can be examined for
error recovery, if desired.

No functional change. nda will still see these as errors, call
ndaerror() to get the error recovery action, etc. cam_periph_error will
select the same case as before (even w/o the change, though the change
makes it explicit).

Sponsored by: Netflix

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

imp requested review of this revision.Jul 19 2023, 4:15 AM
chuck added a subscriber: chuck.
chuck added inline comments.
sys/cam/cam.h
243–248

Nit typo. Should be error

This revision is now accepted and ready to land.Jul 19 2023, 12:47 PM
mav accepted this revision.EditedJul 19 2023, 2:30 PM

Looks good to me.

Just about "the 'cpl' field for nvmeio CCBs is valid and can be examined for error recovery", it would be good if we could report additional information. For example, SATA specs at some point grown ability to report some SCSI sense bytes in addition to extremely limited ATA status, and we have no way to pass that to CAM now. IIRC NVMe already has the same in a shape of error log page, records where provide additional information to the specific error and have no sense without it. IIRC we should already be fetching them, but probably not using for anything.

In D41085#935527, @mav wrote:

Looks good to me.

Just about "the 'cpl' field for nvmeio CCBs is valid and can be examined for error recovery", it would be good if we could report additional information. For example, SATA specs at some point grown ability to report some SCSI sense bytes in addition to extremely limited ATA status, and we have no way to pass that to CAM now.

How do we get these bytes? Is that something each SIM will need to do? I assume they are in addition to the result registers in ataio's res field? We do have 32bits spare right now (marked 'unused') that could be used for at least part of this. Is it the full sense buffer, or just asc and ascq (+ sense key maybe)? If it's just those two or three items, they'd fit....

IIRC NVMe already has the same in a shape of error log page, records where provide additional information to the specific error and have no sense without it. IIRC we should already be fetching them, but probably not using for anything.

I'm a bit torn on these. We'd need to read the page when we start to find the 'oldest' error so we can 'ignore' that. We'd need to grab the page again when an error happens (fast, but we're in error recovery path potentially) to get the additional data:

Entry 01
=========
 Error count:          5
 Submission queue ID:  4
 Command ID:           118
 Status:
  Phase tag:           0
  Status code:         129
  Status code type:    2
  More:                1
  DNR:                 0
 Error location:       0
 LBA:                  0
 Namespace ID:         1
 Vendor specific info: 0
 Transport type:       0
 Command specific info:0
 Transport specific:   0

The submission queue id, command id and status are returned in cpl now. I don't think the others are, but at least namespace id we'd be able to infer from the command. This gives us only an additional couple of words of data. If we publish the bits from cpl / cmd that are needed, the application program can mix/match and get this data (i'd need to add submission queue id to what I've done, I think). Not entirely sure that the kernel is the right place to be adding this info, but I'm open to hearing arguments for adding this complexity (since I'm mostly thinking off the top of my head and haven't delved into the topic).

In D41085#935867, @imp wrote:
In D41085#935527, @mav wrote:

Just about "the 'cpl' field for nvmeio CCBs is valid and can be examined for error recovery", it would be good if we could report additional information. For example, SATA specs at some point grown ability to report some SCSI sense bytes in addition to extremely limited ATA status, and we have no way to pass that to CAM now.

How do we get these bytes? Is that something each SIM will need to do? I assume they are in addition to the result registers in ataio's res field? We do have 32bits spare right now (marked 'unused') that could be used for at least part of this. Is it the full sense buffer, or just asc and ascq (+ sense key maybe)? If it's just those two or three items, they'd fit....

It is not a register, but additional bytes in NCQ Command Error log, already requested by the SIM for any NCQ error. Just originally in that log was only standard ATA register, but in later specs they added more. May be it can be read for non-NCQ commands too, I don't remember, but previously it didn't have sense.

14 SENSE KEY field (see 9.13.6)
15 ADDITIONAL SENSE CODE field (see 9.13.6)
16 ADDITIONAL SENSE CODE QUALIFIER field (see 9.13.6)
17 FINAL LBA IN ERROR field (7:0) (see 9.13.7)
18 FINAL LBA IN ERROR field (15:8) (see 9.13.7)
19 FINAL LBA IN ERROR field (23:16) (see 9.13.7)
20 FINAL LBA IN ERROR field (31:24) (see 9.13.7)
21 FINAL LBA IN ERROR field (39:32) (see 9.13.7)
22 FINAL LBA IN ERROR field (47:40) (see 9.13.7)

So in an sense it is similar to what NVMe provided later for its errors too.

IIRC NVMe already has the same in a shape of error log page, records where provide additional information to the specific error and have no sense without it. IIRC we should already be fetching them, but probably not using for anything.

I'm a bit torn on these. We'd need to read the page when we start to find the 'oldest' error so we can 'ignore' that. We'd need to grab the page again when an error happens (fast, but we're in error recovery path potentially) to get the additional data:

Entry 01
=========
 Error count:          5
 Submission queue ID:  4
 Command ID:           118
 Status:
  Phase tag:           0
  Status code:         129
  Status code type:    2
  More:                1
  DNR:                 0
 Error location:       0
 LBA:                  0
 Namespace ID:         1
 Vendor specific info: 0
 Transport type:       0
 Command specific info:0
 Transport specific:   0

The submission queue id, command id and status are returned in cpl now. I don't think the others are, but at least namespace id we'd be able to infer from the command. This gives us only an additional couple of words of data. If we publish the bits from cpl / cmd that are needed, the application program can mix/match and get this data (i'd need to add submission queue id to what I've done, I think). Not entirely sure that the kernel is the right place to be adding this info, but I'm open to hearing arguments for adding this complexity (since I'm mostly thinking off the top of my head and haven't delved into the topic).

I am not sure how can it be reliably fetched outside the NVMe driver, same as for SATA. Otherwise at very least combination of queue and command IDs may be reused and we'll get some garbage. I have feeling we already fetching it in the driver, just don't use, but I haven't touched it for a while.

In D41085#935886, @mav wrote:
In D41085#935867, @imp wrote:
In D41085#935527, @mav wrote:

Just about "the 'cpl' field for nvmeio CCBs is valid and can be examined for error recovery", it would be good if we could report additional information. For example, SATA specs at some point grown ability to report some SCSI sense bytes in addition to extremely limited ATA status, and we have no way to pass that to CAM now.

How do we get these bytes? Is that something each SIM will need to do? I assume they are in addition to the result registers in ataio's res field? We do have 32bits spare right now (marked 'unused') that could be used for at least part of this. Is it the full sense buffer, or just asc and ascq (+ sense key maybe)? If it's just those two or three items, they'd fit....

It is not a register, but additional bytes in NCQ Command Error log, already requested by the SIM for any NCQ error. Just originally in that log was only standard ATA register, but in later specs they added more. May be it can be read for non-NCQ commands too, I don't remember, but previously it didn't have sense.

14 SENSE KEY field (see 9.13.6)
15 ADDITIONAL SENSE CODE field (see 9.13.6)
16 ADDITIONAL SENSE CODE QUALIFIER field (see 9.13.6)
17 FINAL LBA IN ERROR field (7:0) (see 9.13.7)
18 FINAL LBA IN ERROR field (15:8) (see 9.13.7)
19 FINAL LBA IN ERROR field (23:16) (see 9.13.7)
20 FINAL LBA IN ERROR field (31:24) (see 9.13.7)
21 FINAL LBA IN ERROR field (39:32) (see 9.13.7)
22 FINAL LBA IN ERROR field (47:40) (see 9.13.7)

So in an sense it is similar to what NVMe provided later for its errors too.

OK. I'll see if I can find a good way to shoe-horn that data into the existing structure (most likely with a new ATA_FLAG to indicate the buffer is large enough to handle the sense data, since that seems to be what they are there for).

IIRC NVMe already has the same in a shape of error log page, records where provide additional information to the specific error and have no sense without it. IIRC we should already be fetching them, but probably not using for anything.

I'm a bit torn on these. We'd need to read the page when we start to find the 'oldest' error so we can 'ignore' that. We'd need to grab the page again when an error happens (fast, but we're in error recovery path potentially) to get the additional data:

Entry 01
=========
 Error count:          5
 Submission queue ID:  4
 Command ID:           118
 Status:
  Phase tag:           0
  Status code:         129
  Status code type:    2
  More:                1
  DNR:                 0
 Error location:       0
 LBA:                  0
 Namespace ID:         1
 Vendor specific info: 0
 Transport type:       0
 Command specific info:0
 Transport specific:   0

The submission queue id, command id and status are returned in cpl now. I don't think the others are, but at least namespace id we'd be able to infer from the command. This gives us only an additional couple of words of data. If we publish the bits from cpl / cmd that are needed, the application program can mix/match and get this data (i'd need to add submission queue id to what I've done, I think). Not entirely sure that the kernel is the right place to be adding this info, but I'm open to hearing arguments for adding this complexity (since I'm mostly thinking off the top of my head and haven't delved into the topic).

I am not sure how can it be reliably fetched outside the NVMe driver, same as for SATA. Otherwise at very least combination of queue and command IDs may be reused and we'll get some garbage. I have feeling we already fetching it in the driver, just don't use, but I haven't touched it for a while.

I don't think we're fetching it, but I'll need to expand nvmeio the same way, most likely by turning 'unused' into a flag word and using it to indicate the error log page data has been fetched there. Errors are infrequent enough that it's easy to fetch, but if there's two at the same time, I'm not sure what to do... to handle that we'd likely need to have a 'cache' of the most recent errors and check that before reading the device...

I'm not sure if it's a good idea for the driver to be fetching the error log page vs letting something in userspace do that and match things up based on the queue ID and CID, etc. That feels like something that should live in userspace and not the driver. You can even do it post-boot by comparing /var/log/messages output with the persistent error log page.

@jhb In case of SATA the log page needs to be read before the next NCQ error or better before teh next NCQ command, sine there is only one log entry. NVMe relaxes it a bit, but as I have told above, the queue can rotate too fast, and the log is limited also.

I consider it as a form of REQUEST SENSE, which we originally done at periph layer, but later migrated into HBAs or SIMs.

This revision was automatically updated to reflect the committed changes.