Block layer request queue API changes [LWN.ne...

望穿墙 2013-04-04

展开全文

Block layer request queue API changes

By Jonathan Corbet
May 18, 2009

Prior to the 2.6 kernel series, the Linux block layer was somewhat simplistic and inflexible; it showed a lot of history from the early days of the Linux kernel. With the 2.5 development series came a complete rewrite; there have, of course, been a great many changes since then as well. But there are still bits of history to be found in the Linux block API. If Tejun Heo has his way, some of that history will be gone in the near future.

The standard way for a block driver to gain access to the next I/O request in the queue is with a call to:

    struct request *elv_next_request(struct request_queue *queue);

This function returns the request which is, in the I/O scheduler's opinion, the best one to execute next. An interesting feature of elv_next_request() is that it leaves the request on the queue; two calls to elv_next_request() in quick succession will return pointers to the same request. A block driver can explicitly remove the request from the queue with a call to blkdev_dequeue_request(), but that step is not necessary. If a request remains at the head of the queue when the block driver indicates that it has been completed, the block layer will dequeue the request at that time.

Leaving the request on the queue is a throwback to the very early days, when requests were handled one at a time - often a single sector at a time. By hiding the queuing details, the block layer made life easier for simple block drivers. But this apparent simplicity comes at a cost: it complicates the block API and makes it impossible for the block layer to know when processing of a request has begun. So it's not possible to do reliable request timing when drivers work on requests which remain on the queue.

This feature is also increasingly useless. Any contemporary driver worth its salt will process multiple requests at once; that, in return, requires that the driver dequeue requests and keep track of them itself. So few drivers that people actually care about use the process-on-queue model. Given that, Tejun has come to the conclusion that processing on-queue requests is an idea whose time has passed. He has posted a lengthy patch series to make it go away.

The bulk of the patches are concerned with converting all drivers over to the "dequeue the request first" mode of operation. Typically that's just a matter of adding a blkdev_dequeue_request() call in the right place. A few places (the IDE subsystem, for example) are a bit more complex, but, for the most part, the changes are straightforward.

Once that has been done, the patch series culminates with a set of API changes. There is no more elv_next_request(); instead, a driver wanting to look at a request without dequeueing it will call:

    struct request *blk_peek_request(struct request_queue *queue);

Following that, a request can be dequeued with a call to blk_start_request(), which replacesblkdev_dequeue_request():

    void blk_start_request(struct request *req);

In addition to removing the request from the queue, blk_start_request() will start a timer for the request, allowing it to eventually respond if completion is never signaled. Most of the time, though, drivers will just call:

    struct request *blk_fetch_request(struct request_queue *q);

which is a combination of blk_peek_request() and blk_start_request().

There is one other, under-the-hood change which goes along with the above: any attempt to complete a request which remains on the request queue will oops the system. One can think of this as a very clear message that on-queue processing is no longer considered to be the right thing to do in the Linux kernel. That, in turn, is part of the motivation for the API changes, which, for the most part, are just name changes: Tejun wants to be sure that maintainers of out-of-tree block drivers will notice that something has changed and respond accordingly.

These patches have been through a couple of rounds of review. Nothing is ever certain, but it's entirely possible that this set of changes could go in for the 2.6.31 kernel.

(Log in to post comments)