chapter 14 - The Block I/O Layer

Block Device hardware devices distinguished by the random access of fixed-size chunks of data

Anatomy of a Block Device

sector
- block device 에서 접근할 수 있는 데이터의 최소 단위
block
- 소프트웨어에서 접근할 수 있는 데이터의 논리적 최소 단위
- Although the physical device is addressable at the sector level, the kernel performs all disk operations in terms of blocks.
device는 sector를 최소 단위로 하여 동작하지만, 커널은 모든 디스크 operation을 block 단위로 수행한다.

Buffers and Buffer Heads

When a block is stored in memory(after a read or pending a write), it is stored in a buffer.
Each buffer is associated with exactly one block.

Because the kernel requires some associated control information to accompany the data, each buffer is associated with a descriptor, a buffer head.

$include/linux/buffer_head.h
struct buffer_head {
  unsigned long b_state;           // buffer status flags
  struct buffer_head *b_this_page; // per-page buffer list
  struct page *b_page;             // pointer to the descriptor for the page that stores the buffer
  sector_t b_blocknr;              // logical block number
  size_t b_size;                   // block size
  char *b_data;                    // pointer to buffer
  atomic_t b_count;                // use count
  /* etc */
};

b_count
- the buffer's usage count
- Before manipulating a buffer head, you must increment its reference count via get_bh() to ensure that the buffer head is not allocated out from under you.
- When finished with the buffer head, decrement the reference count via put_bh().
```
$include/linux/buffer_head.h
static inline void get_bh(struct buffer_head *bh)
{
atomic_inc(&bh->b_count);
}
static inline void put_bh(struct buffer_head *bh)
{
atomic_dec(&bh->b_count);
}
```

Before the 2.6 kernel, the buffer head was used as the unit of I/O in the kernel.
Not only did the buffer head describe the disk-block-to-physical-page mapping, but it also acted as the container used for all block I/O.
problem
- The buffer head was a large and unwieldy data structure, and it was neither clean nor simple t manipulate data in terms of buffer heads.
  - In the 2.6 kernel, much work has gone into making the kernel work directly with pages and address spaces instead of buffers.
  - address_space structure, pdflush daemons
- The buffer head describes only a single buffer.
  - The primary goal of the 2.5 development kernel was to introduce a new flexible, and lightweight container for block I/O operations.
  - bio structure

The bio structure

$include/linux/blk_types.h
struct bio;

$include/linux/bvec.h
struct bio_vec {
    /* pointer to the physical page on which this buffer resides */
    struct page *bv_page;

    /* the length in bytes of this buffer */
    unsigned int bv_len;

    /* the byte offset within the page where the buffer resides */
    unsigned int bv_offset;
};

The basic container for block I/O within the kernel is the bio structure.
The primary purpose of a bio structure is to represent an in-flight I/O operation.
The bio structure provides the capability for the kernel to perform block I/O operations of even a single buffer from multiple locations in memory.

Request Queues

$include/linux/blkdev.h
struct request;
struct request_queue;

Block devices maintain request queues to store their pending block I/O requests, request_queue structure.
The request queue contains a doubly linked list of requests and associated control information.
Each item in the queue's request list is a single request, of type struct request.
Each request can be composed of more than one bio structure.

I/O Schedulers

The Job of an I/O Scheduler

An I/O scheduler works by managing a block device's request queue.
It manages the request queue with the goal of reducing seek time, which results in greater global throughput.
I/O schedulers perform two primary actions to minimize seek time: merging and sorting.
- merging : If a request is already in the queue to read from an adjacent sector on the disk, the two request can be merged into a single request operating on one or more adjacent on-disk sectors.
- sorting : The entire request queue is kept sorted, sectorwise, so that all seeking activity along the queue moves sequentially over the sectors of the hard disk.

The Linus Elevator

$block/elevator.c

properties
- If a request to an adjacent on-disk sector is in the queue, the existing request and the new request merge into a single request.
- If a request in the queue is sufficiently old, the new request is inserted at the tail of the queue to prevent starvation of the other, older, requests.
- If a suitable location sector-wise is in the queue, the new request is inserted there. This keeps the queue sorted physical location on disk.
- Finally, if no such suitable insertion point exists, the request is inserted at the tail of the queue.

The Linus Elevator could provide better overall throughput via a greater minimization of seeks if it always inserted requests into the queue sector-wise and never checked for old requests and reverted to insertion at the tail of the queue.
starvation
- Heavy disk I/O operations to one area of the disk can indefinitely starve request operations to another part of the disk.
- Indeed, a stream of requests to the same area of the disk can result in other far-off requests never being serviced. This starvation is unfair.

The Deadline I/O Scheduler

$block/deadline-iosched.c

properties
- When a new request is submitted to the sorted queue, the Deadline I/O scheduler performs merging and insertion like the Linus Elevator.
- Deadline I/O scheduler also, however, inserts the request into a second queue that depends on the type of request, read FIFO and write FIFO.
- Under normal operation, the Deadline I/O scheduler pulls requests from the head of the sorted queue into the dispatch queue.
- If the request at the head of either the write FIFO queue or the read FIFO queue expires, the Deadline I/O scheduler then begins servicing requests from the FIFO queue.

The Deadline I/O scheduler works harder to limit starvation while still providing good global throughput.

The Anticipatory I/O Scheduler

properties
- After the request is submitted, the Anticipatory I/O scheduler does not immediately seek back and return to handling other requests.
- Instead, it does absolutely nothing for a few milliseconds. In those few milliseconds, there is a good chance that the application will submit another read request.
- After the waiting period elapses, the Anticipatory I/O scheduler seeks back to where if left off and continues handling the previous requests.

The Anticipatory I/O scheduler attempts to minimize the seek storm that accompanies read requests issued during other disk I/O activity.
With a sufficiently high percentage of correct anticipations, the Anticipatory I/O scheduler can greatly reduce the penalty of seeking to service read requests, while still providing the attention to such requests that system response requires.

The Complete Fair Queuing I/O Scheduler

$block/cfq-iosched.c

properties
- There is one queue for each process submitting I/O.
- The CFQ I/O scheduler services the queues round robin, plucking a configurable number of requests from each queue before continuing on to the next.
The CFQ I/O scheduler is an I/O scheduler designed for specialized workloads, but that in practice actually provides good performance across multiple workloads.

The Noop I/O Scheduler

$block/noop-iosched.c

properties
- The Noop I/O scheduler does not perform sorting or any other form of seek-prevention whatsoever.
- The Noop I/O scheduler does perform merging as its lone chore.
The Noop I/O scheduler is intended for block devices that are truly random-access, such as flash memory cards.
If a block device has little or no overhead associated with "seeking", then there is no need for insertion sorting of incoming request, and the Noop I/O scheduler is the ideal candidate.

14. The Block I/O Layer

chapter 14 - The Block I/O Layer

Anatomy of a Block Device

Buffers and Buffer Heads

The bio structure

Request Queues

I/O Schedulers

The Job of an I/O Scheduler

The Linus Elevator

The Deadline I/O Scheduler

The Anticipatory I/O Scheduler

The Complete Fair Queuing I/O Scheduler

The Noop I/O Scheduler

results matching ""

No results matching ""