Differences of less than 4 (RQ_PPQ) are insignificant and are simply removed. No functional change (intended).
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Wed, May 29
I suspect that first thread was skipped to avoid stealing a thread that was just scheduled to a CPU, but was unable to run yet.
I am not fully sure about the motivation of this change, but It feels wrong to me to have per-namespace zones. On a big system under heavy load UMA does a lot of work for per-CPU and per-domain caching, and doing it also per-namespace would multiply resource waste. Also last time I touched it, I remember it was difficult for UMA to operate in severely constrained environments, since eviction of per-CPU caches is quite expensive. I don't remember how reservation works in that context, but I suppose that having dozens of small zones with small reservations, but huge per-CPU caches is not a very viable configuration.
Thu, May 23
Tue, May 14
I see no problems, but I have difficulties to believe that timeout handlers 1-2 times per second per queue pair may have any visible effects. Also I am not happy to see second place where timeouts are calculated. And 99/100 also looks quite arbitrary.
Mechanically it seems to have sense. I've missed when than original transition happened, but if you say it is right, so be it.
Tue, May 7
I wonder if there is any real architecture where pointer load/store is non-atomic. For things that are going to be executed between once and never it feels like you are over-engineering it. :)
I have no objections, if it is useful.
May 3 2024
Apr 27 2024
Apr 26 2024
In D44961#1025280, @asomers wrote:What is an "OOA queue"?
I wonder what is your queue depth, so that one message per request per 90 seconds would cause a noticeable storm. Also per-system limiting makes output not very useful, since it does not say much useful about LUNs, ports, commands, etc due to selecting first message out of many, only that something is wrong. Thinking even wider, I find those messages printed on actual completion not very useful, since if there are not a delays, but something is really wrong, the commands many never complete and so the messages may never get printed. I wonder if instead removing all this and once per second checking OOA queues for stuck requests and printing some digests would be more useful.
Apr 20 2024
Looks good to me, but if you wish, couple cosmetic thoughts.
Looks good to me, though seems only cosmetic.
Apr 17 2024
Apr 10 2024
Mar 25 2024
Mar 21 2024
I don't have any chip documentation to know what is right here, so just wonder if unconditional printing a bunch of raw hex numbers is expected here. It feels mpi3mr_print_fault_info() is another candidate for mpi3mr_dprint().
I am not a big fan of kernel printing something in response to arbitrary user requests, it makes logs messy. Is the error reporting to user is not enough here?
Mar 18 2024
Why not backport 506fe78c48 instead?
Mar 15 2024
My only complaint is that it puts the queue into the same cache line as the main queue, that may be modified by writers. But if you really need it for debugging, it could be understood.
Mar 6 2024
Mar 5 2024
On failure we've already notified consumers that controller has failed. What will report it is back? And is there even a device to sent request IOCTL?
If you say it helps I have no objections, but I see nvme_sim_controller_fail() destroying SIM, so I am not sure you actually get here.
I wonder if there are any namespace-specific events? I remember NVMe specs allow per-namespace SMART, but I don't remember much details now.
In D39620#1008905, @sean_rogue-research.com wrote:stable/13 has this patch
releng/13.2 doesn't have this patch (yet).I'm not very familiar with FreeBSD's branching system... I see FreeBSD 13.3-RELEASE was released today, is this bug fix included?
Feb 27 2024
Feb 5 2024
Jan 27 2024
Jan 19 2024
Jan 10 2024
In D43385#989072, @gallatin wrote:Is it possible that the firmware could set ACPI_HEST_GEN_ERROR_FATAL in ged->ErrorSeverity but not ges->ErrorSeverity ?
There is already a panic in apei_ge_handler(), based on total status severity. Do you see it not enough?