Buffer Management Algorithms
CS560 NOTES on I/O Buffers in Unix
0. REFERENCES: Notes on Unix I/O buffer Management
M.J Bach: The Design of the UNIX Operating System, Chapter 3
1. Purpose:
A set of NBUF buffers in K space is used as a cache memory between
block devices, e.g. disks, and processes doing block read/write.
The goal is to reduce the number of actual I/O operations.
Basic principle:
When a process wants to read from (dev,blk), it first searches the
buffer cache for a buffer assigned to this (dev,blk). If such a
buffer exists with valid data, it simply reads data from the buffer
without incurring any I/O operation. If such a buffer does not exist,
it tries to find a free buffer, assigns the buffer to (dev,blk), issues
a DiskRead() operation, waits for I/O completion, then reads data from
the buffer. Once a (dev,blk) is read in, the buffer will remain in the
buffer cache for next possible read requests for the same (dev,blk) by ANY
process..
When a process wants to write to (dev,blk), it writes to a buffer
assigned to (dev,blk). Actual writing to the device may take place
much later.
2. Block I/O Functions (Single_CPU/Uniprocessor Unix)
6221 readi(aip) // basic logic
{
aip points at inode (containing dev);
from current offset, compute logical blk, then physical blk number.
bp = bread(dev, blk); returns a BUFFER pointer containing data
iomove(bp, ....); copy data from this BUFFER to dest location
}
4754 struct buf *bread(dev blk)
{
bp = getblk(dev,blk);
if (bp->b_flags & b_DONE)
return bp;
bp->b_flags |= B_READ;
bp->b_wcount=256;
bp to devtab's I/O queue; (may start I/O)
iowait(bp); // sleep on bp until DONE
return bp;
}
Goodie: breada(): pre-read the next block, so no wait for DONE
6276 writei(aip):
aip points at inode (contianing dev)
map logical block to physical block blk;
if (writing whole block)
bp = getblk(dev, blk); // find a bp
else
bp = bread(dev, blk); // read in the block into *bp
iomove(data into *bp)
Then either bawrite(bp) OR // write in order
bdwrite(bp); // random access files
4809
bwrite(bp) // sync write; wait for I/O done
{
mark bp for B_WRITE;
put bp into dev's I/O queue (may start I/O)
iowait(bp)l // wait until DONE
brelse(bp); // release the buffer
}
4836 bdwrite(bp) // delayed write
{
mark bp DELWRI and DONE;
release the buffer;
}
bawtie(bp) // async write; do not wait for I/O done
{
mark bp ASYNC
call bwrite(bp) to put bp into dev I/O queue;
but do NOT wait for DONE;
// bp will be released by interrupt handler
}
2. Buffer Management in Unix:
(1). Each block device, e.g. a disk, has a dev number, and a corresponding
device-table
4551 struct devtab{ .........};
Each devtab maintains a dev_list containing I/O buffers currently
assigned to the device, and an IO-queue containing bufs for pending I/O
operations on the device.
(2). A set of NBUF I/O buffers
4520-4586 struct buf{ ..........} buf[NBUF];
is allocated in K space. Each struct buf is a buffer header
containing fields for buffer management. Each buffer header
has two sets of linking pointers; (b_forw,b_back) for free_list,
(av_forw, av_back) for av_list, and a data pointer pointing at
the buf's actual data area
(4720) char buffer[NBUF][514];
(3). Initialization of bufs:
5055 binit(): is called during system booting. It links all bufs
into two doubly linked lists headed by a special buf
4567 struct buf bfreelist;
Initially, all bufs are on the av_list. Whenever a buf is
assigned to a (dev,blk), it is taken out of the av_list and
inserted into the dev_list of the device's devtab structure.
If the buf is currently in use, it is marked BUSY, and removed from
the av_list. A BUSY buf may be in the I/O queue of a devtab,
using its av_list pointer. When a buffer is no longer BUSY, it
is released back to the av_list but remains on the dev_list.
3. Block read/write algorithms
As shown above, bread, bwrite and bdwrite depend on getblk()/brelse().
4. getblk()/brelse() algorithms: (M.J. Bach TEXT: Figure 3.4, Figure 3.6)
4921 bp = getblk(dev,blk){
loop:
(1). search devtab's dev_list for a bp = (dev,blk);
(2). if (found such a bp){
if bp is BUSY:{
mark bp WANTED;
sleep on bp;
**************
goto loop;
}
/* bp not NUSY */
take bp out of av_list; mark bp BUSY;
return(bp);
} /* end found */
----------------------------------------------
(3). /* not found; try to allocate a free buf from av_list */
if (bfreelist's av_list is empty){
mark bfreelist WANTED;
sleep on bfreelist;
************************
goto loop;
}
(4). /* at least one buf on av_list */
take first bp out of av_list;
if (this bp is for DELAYed WRITE){
write bp out ASYNC;
*******************
goto loop;
(5). mark bp BUSY; assigned bp to (dev,blk);
relink bp to (new) dev_list;
return(bp);
}
4869 brelse(bp){
if (bp is WANTed)
wakeup() ALL sleeping on bp;
if bfreelist is WANTed)
wakeup() ALL sleeping on bfreelist;
put bp back to the (tail of ) av_list;
}
--------------------------------------------------------------------------
COMMENTS on getblk()/brelse():
(1). Data Consistency:
In order to ensure data consistency, getblk() must never assign
two buffers to the same (dev,blk). =====> go to retry loop after
waking up from sleep() because what it wanted may already exist.
During a WRITE operation, data are written to a buffer, which is
marked DELWRI (Delayed Write) but remains in the buffer pool
until it is to be reassigned to a different (dev,blk).
Dirty buffers are written out before they are reassigned.
(2). Cache effect:
Cache effect is achieved mainly by:
brelse(bp) puts bp back to the (tail of) av_list but let it remain in
the dev_list and retain its (dev,blk) identity until it is grabed for
reassignment.
Once a bp is assigned to a specific (dev,blk), all efforts are
made to prolong its life, e.g. by
Delayed Write, and relesaing to the tail, but grabbing from the
front, of the av_list. (LeastRecentlyUsed principle).
(3). Critical Regions:
Disk interrupt handlers may manipulate the buf lists, e.g.
dequeue a bp from a devtab's IO-queue, change its status and
call brelse(bp).
So, in getblk()/brelse(), disk interrupts are masked out in these
critical regions.
(4). Shortcomings of the algorithm:
1. Inefficiency: the algorithm relies on re-try loops after sleep()/
wakeup().
2. No concurrent reads (for multiprocessor kernel).
3. Possible starvation.
4. Use sleep()/wakeup(), good only for Uniprocessor kernel.
-----------------------------------------------------------------------
CS 560 TAKEHOME EXAM
NOTE: THIS IS AN EXAM! ABSOLUTELY INDEPENDENT WORK !!!!
PROBLEM SPECIFICATIONS:
Use P,V on counting semaphores to design a set of NEW I/O buffer
management algorithms:
Part A: for UniPocessor (UP) Kernel
Part B: for MultiProcessor (MP) Kernel
that meet the following requriements:
(NOTE: The conditions are ordered by their relative importance, which will
also be the basis of GRADing)
PART A: DUE Nov, 9, 2011
Assume Uniprocessor Kernel (One process at a time)
(1). Data consistency.
(2). Cache effect.
******** (1) and (2) are the same as in Unix ********
(3). Efficiency:
No re-try loops.
No unnecessary process "wakeups", i.e. a blocked process
is not "awakened" unless it can actually get a buffer.
(4). Free of starvation.
NOTE AGAIN:
1. Merely replacing sleep()/wakeup() in Unix algorithms by P()/V() on
semaphores is NOT an acceptble solution. You MUST redesign the algorihtms
by using semaphores ONLY.
2. (1)(2)(3)(4) are the ranking of their relative importance.
For example, if an algorithm cannot guarantee data consistency,
it would be INCORRECT no matter how efficient it is. Similarly,
since (4) ranks far below (2), your algorithm must ensure no
starvation BUT NOT AT THE EXPENSE OF REDUCED CACHE EFFECT.
Express your algorithm(s) in Pseudo-C with lots of comments
and/or a separate document to explain your algorithm(s).
TEXT EDIT your work suitable for printing hard copies.
*************************************************************************
Nov 9,2011 : Part A due.
Grading : Each of your algorithms will be graded in 2 steps:
First draft on the posted due date.
ONE revision 1 week after original due date.
=========================================================================
PART B: Firsst draft DUE : Nov 16, 2011
Revision DUE : Nov 30 2011
Assume Multiprocessor Kernel. Buffers are maintained in hash queues,
as in Bach Chapters 3, 12. Add these additional requirements:
(0). High degree of concurrency (which MP algorithms must have)
(5). Allow concurrent readers on the same buffer.
(6). Free of starvation and deadlock.
NOTE FOR MP ALGORITHMS:
In addition to P()/V(), you may define any other "primitive" operations
on semaphores, e.g. CP (Conditional P) as in Bach's Chapter 12.
============================================================================
Time Table:
Algorithms: Completed by Nov 30, 2011
Project : Before Finals Week.
PROJECT
-----------------------------------------------------------------------------
Implement AND demo your algorithms on Multisking platform that simulates either
UP or MP kernel (to be intruduced before Thanksgiving break).
Close Week: Project demonstration.
-----------------------------------------------------------------------------