Purpose: Parallel algorithms in SMP; Preliminaries for Course Project.
0. Review of Critical Region (CR), atomic instruction, spinlock and semaphores.
(1). CR = operations on shared data object which can only be accessed by one
process at a time.
(2). Atomic instruction and spinlock:
! ------- implementation of slock(&x)/sunlock(&x) -----------------
slock:
pushl %ebp
movl %esp,%ebp
pushl %ebx # save ebx
movl 8(%ebp),%ebx # pointer to spin lock x
spin: movl $1,%eax
xchg (%ebx),%eax # ATOMIC: get x and set x=1
bt $0,$eax
jc spin # spin if x was already 1
popl %ebx # return only if x was 0
popl %ebp
ret
! sunlock(&x)
sunlock:
pushl %ebp
movl %esp,%ebp
pushl %ebx
movl 8(%ebp),%ebx # pointer to spinlock x
xorl %eax,%eax # eax=0
xchg (%ebx),%eax # ATOMIC: set x=0
popl %ebx
popl %ebp
ret
(3). Semaphore and P/V Operations
typedef struct semaphore{
u32 lock; // per-semaphore spin lock; initialized to 0
int value;
struct proc *queue;
}SEMAPHORE;
where the per-semaphore spin lock is to ensure that every operation on a
semaphore is a critical region. The P and V operations are implemented as
int P(struct semaphore *s)
{
PROC *p; int ps;
ps=int_off(); // disable CPU interrupts
slock(&s->lock); // acquire the semaphore spin lock
s->value--;
if (s->value < 0){
running->status=BLOCK;
running->sem = s; // for un-do P operation on semaphore
enqueue(&s->queue, running);
sunlock(&s->lock); // release spin lock
tswitch(); // give up CPU
}
else
sunlock(&s->lock); // release spin lock
int_on(ps); // restore CPU interrupt mask
}
int V(struct semaphore *s)
{
PROC *p; int ps;
ps=int_off(); // disable CPU interrupts
slock(&s->lock); // spin lock
s->value++;
if (s->value <= 0){
p = dequeue(&s->queue);
p->sem = 0;
p->status = READY;
schedule(p);
}
sunlock(&s->lock); // release spin lock
int_on(ps); // restore CPU interrupt mask
}
Similarly, we may define and implement ANY OTHER operation on a semaphore,
provided that it operates in the CR of the semaphore. Examples:
Conditional_P(): int CP(*s){
if (s->value > 0){
P(s); return 1;
}
return 0;
}
Conditional_V(): in CP(s){
if (s->value < 0)
V(s);
}
int value(s){ return s->value; }
etc.
1. Review of the Producer-Consumer Problem
A set of Producer processes share a finite set of buffers with another set
of Consumer processes. They operate as follows.
Initially, all buffer cells are empty. When a Producer puts an item
into an empty cell, the cell becomes full. When a Consumer gets an
item from a full cell, that cell becomes empty, etc. Naturally, each
cell can only contain ONE item at a time. A producer must WAIT if there are
no empty cells. Similarly, a Consumer must WAIT if there are no full
cells. Furthermore, WAITing processes should be allowed to continue when
their awaited events occur. Figure 6- shows a solution to the Producer-
Consumer problem, in which mutex semaphores are for processes to access
the circular buffer as CRs and producers and consumers cooperate by
the full and empty semaphores.
DATA buf[N]; /* N buffer cells */
int head=tail=0; /* index to buffer cells */
SEMAPHORES empty = N; full = 0; pmutex = 1; cmutex = 1;
Producer: Consumer:
----------------------------------------------------------------
while (1){ | while(1){
produce an item; | ...................;
P(empty); | P(full);
P(pmutex); | P(cmutex);
buf[head++] = item; | item = buf[tail++];
head %= N; | tail %= N;
V(pmutex); | V(cmutex);
V(full); | V(empty);
................; | /* consume item */
} | {
----------------------------------------------------------------
Producer-Consumer Problem Solution
2. Pipes
Pipes are unidirectional interprocess communication channels for processes
to exchange data. A pipe has a read end and a write end. Data written to the
write end of a pipe can be read from the read end of the pipe. Since their
debute in the original Unix, pipes have been incooperated into most OS, with
many variations. Some systems allow pipes to be bidirectional, in which data can
be transmitted in both directions. Ordinary pipes are for related processes.
Named pipes are FIFO communication channels between unrelated processes. Reading
and writing pipes are usually synchronous and blocking. Some systems also
support nonblocking and asynchronous read/write operations on pipes.
For simplicity, we shall consider a pipe as a finite-sized FIFO communication
channel between a set of related processes. Reader and writer processes of a
pipe are synchronized in the following manner. When a reader tries to read from
a pipe, if the pipe has data, the reader reads as much as it needs (up to the
pipe size) and returns the number of bytes read. If the pipe has no data but
there are still writers, the reader "waits" for data. When a writer writes data
to a pipe with waiting readers, it "wakeup" the waiting readers, allowing them
to continue. If the pipe has no data and also no writer, the reader returns 0.
Since readers wait for data if the pipe still has writers, the 0 return value
means only one thing: the pipe has no more data and also no writer. In that
case, the reader can decide what to do next. When a writer writes to a pipe, if
the pipe has room, the writer writes as much as it needs to or until the pipe is
"full", i.e. no more room. If the pipe has no room but still has readers, the
writer "waits" for room. When a reader reads data from the pipe to create more
rooms, it "wakeup" any waiting writers, allowing them to continue. Howeveer,
when a writer tries to write to a pipe that has no reader, the writer detects
this as a "broken pipe" error and aborts.
6.13.2. Pipe Programming in Unix/Linux
Although a pipe is capable of supporting multiple wrtier-reader processes,
the most common usage is to connect a pair of writer-reader processes through
each pipe. For example, the Unix command
cat filename | grep pattern
connects a process executing cat and another process executing grep by a pipe.
In Unix, pipes are supported by a set of pipe related syscalls. The syscall
pipe(int pd[2]) creates a pipe in the kernel and returns two file descriptors
in pd[2], where pd[0] is for reading from the pipe and pd[1] is for writing to
the pipe. However, a pipe is not intended for a single process. For example,
after creating a pipe, if the process tries to read even 1 byte from the pipe,
as in
char c; read(pd[0], &c, 1);
the process will never return from the read syscall. This is because when the
process tries to read from the pipe, there is no data but the pipe has a writer,
so it "waits" for data. But who is the writer? The process itself. So the
process waits for itself, thereby locking itself out so to speak. Conversely, if
the process tries to write more than the pipe size (4KB in most cases), the
process would again waits for itself when the pipe becomes full. Thus, while
any process may create a pipe and get two file descriptors, a process can only
be either a reader or a writer on a pipe, but not both. The correct usage of
pipe is as follows. After creating a pipe, the process forks a child process.
During forking, the child inherits all the opened file descriptors of the
parent. So both the parent and child have the same read/write pipe diescriptors.
The user must designate one of the processes as a writer and the other as a
reader of the pipe. The order does not matter as long as each process is
designated to play only a single role. Assume that the parent is chosen as the
writer and the child as the reader. Then each process must close its unwanted
pipe descriptor, i.e. the witer must close its pd[0] and the reader must close
its pd[1]. Then the parent can write to the pipe and the child can read from the
pipe. The following figure shows the system model of pipe operations.
---------------------------- Unix kerenl -------------------------------------
writerProc readerProc
fd[pd[1]] ----> writeOFT ---> PIPE ----> readOFT ------> fd[pd[0]]
^ |
---- | ------------------------------------------------------ | --------------
| | v
int n = write(pd[1],wbuf,nbytes)| int n = read(pd[0],rbuf,nbytes)
|
int pd[x|1]; char wbuf[ ] | int pd[0|x]; char rbuf[ ]
|
Writer process Uimage | reader process Uimage
---------------------------------------------------------------------------
In the figure, a writer process issues a write(pd[1], wbuf, nbytes) syscall
to enter the OS kernel. It uses the file descriptor pd[1] to access the PIPE
through the writeOFT. It execute write_pipe() to write data into the PIPE's
buffer, waiting for room if necessary. On the right-hand side of the figure, a
reader process issues a read(pd[0],rbuf,nbytes) to enter the OS kernel. It uses
the file descriptor pd[0] to access the PIPE through the readOFT. Then it
execute read_pipe() to read data from the PIPE's buffer, waiting for data if
necessary. The writer process may terminate first when it has no more data to
write, in which case the reader can continue to read as long as the PIPE still
has data. However, if the reader terminates first, the writer should see a
broken pipe error and also terminate. Note that the broken pipe condition is
not symmetrical. It is a condition of a communiction channel on which there are
writers but no readers. The reverse is not a broken pipe. The follwoing program
demonstrates pipes in Unix/Linux.
/********************** Unix Pipe Example ***************************/
#include < stdio.h>
#include < stdlib.h>
#include < string.h>
int pd[2], n, i;
char line[256], *s="data from pipe ";
main()
{
pipe(pd); // create a pipe
if (fork()){ // fork a child as READER, parent as WRITER
printf("parent %d close pd[0]\n", getpid());
close(pd[0]);
while(i++ < 10){ // parent writes to pipe
printf("parent %d writing pipe : %s\n", getpid(), s);
write(pd[1], s, strlen(s));
}
printf("parent %d exit\n", getpid());
exit(0);
}
else{ // child as pipe READER
printf("child %d close pd[1]\n", getpid());
close(pd[1]);
while(1){ // child read from pipe
printf("child %d reading from pipe\n", getpid());
if ((n = read(pd[0], line, 256)) == 0)
exit(0);
line[n]=0; printf("%s n=%d\n",line, n);
}
}
}
Run the program under Linux and observe its behavior with the following
experiments.
(1). Let the parent be the reader and child be the writer.
(2). Let the writer write continuously and the reader only read 10 times.
In case (1), the sh prompt will reappear only when the reader dies. In case
(2), the writer will die by a broken pipe signal SIGPIPE=13 as soon as the
reader exits.
3. Pipes and the Producer-Consumer Problem
In principle, pipes are similar to the producer-consumer problem, but there
are differences. In the producer-consumer problem, processes run forever. They
are synchronized beautifully as long as both producer and consumer processes
exist. However, in the real world processes do not behave that way. For
instance, what if all the consumers have died (abnormally, perhaps)? Should
producers still write items to the shared buffer? If they do, then what for?
since there is no one to consume the items anymore. Worse yet, if producers
continue to write, eventually all of them will be blocked when the buffer is
full, a degenerated form of deadlock. On the consumer side, what if all the
producers have died? Should they still wait for items that will never come? If
so, they will also end up in a degenerated form of deadlock. In the idealistic
producer-consumer problem, such real issues are totally ignored. The solution
may be very elegant but has little practical value. However, if we amend the
producer-consumer problem to make it applicable to the real word, the result is
a pipe.
Assume that a pipe has a char buffer of size PSIZE, with the following variables
and semaphores for process synchronization.
int nreader = number of readers, nwriter = number of writers;
SEMAPHORE data = number of chars in the pipe buffer, initial value=0;
SEMAPHORE room = number of space in the pipe buffer, initial value=PSIZE;
SEMAPHORE rmutex = wmutex = locking sempahores, initial value=1.
Then, we can express pipe write/read algorithms as follows.
--------------------------------------------------------------------------
int write_pipe(int n) | int read_pipe(int n)
{ | {
int r = n; | int r = 0;
while (n){ | while(n){
W1: if (pipe has no reader) | R1: if (pipe has no writer && no data)
exit(BROKEN_PIPE); | return r;
W2: P(room); | R2: P(data);
W3: if (pipe has no reader) | R3: if (pipe has no writer && no data)
exit(BROKEN_PIPE); | return r;
P(wmutex); | P(rmutex);
write a char to pipe; | read a char from pipe;
n--; | n--; r++;
V(wmutex); | V(rmutex);
V(data); | V(room);
} | }
return r; | return r;
} | }
--------------------------------------------------------------------------
The algorithms are exactly the same as that of the producer-consumer problem,
except for the tests at W1, W3, R1 and R3. In write_pipe(), a writer must check
whether the pipe still has any reader. If not, it must abort with a BROKEN_PIPE
error before doing P(room). Otherwise, it may block at W2 forever. When a writer
that was blocked at W2 resumes, it must check for BROKEN_PIPE error again since
the readers may have terminated. Similarly, in read_pipe(), a reader must check
whether the pipe has any writer and data before doing P(data). Otherwise, it
may wait for data that will never come. When a reader that was blocked at R2
resumes, it cannot assume that there are data in the pipe, as in the producer-
consnumer problem, becasue the writers may have terminated. The above algorithms
work if and only if we can perform the tests at W1, W3, R1 and R3 reliably. With
the pipe variables nreader and nwriter, the problem seems to be trivial. For
example, at W1, we can test the value of nreader. At R1, we can test the values
of both nwriter and the semaphore data. Unfortunately, such tests work only in
UP kernels but not in SMP.
In a SMP environment, testing the value of a semaphore is of no use, even if it
is performed in a critical region, since the semaphore value may be changed by
another process (on different CPU) immediately after the test. Therefore, any
decision based on such tests would be unreliable. For example, if a pipe reader
has just checked that there are still writers but no data, so it intends to wait
for data. But before it completes the "WAIT" operation, all the writers (on
other CPUs) may have died. If so, the reader process would follow the incorrect
decision to wait forever. Similarly, if a writer process has just checked that
there are still readers and it intends to "WAIT" for room in the pipe. But
before it completes the "WAIT" operation, all the readers (on other CPUs) may
have died. If so, the writer would be blocked forever also.
In the following, we shall show the implementation of pipes in the SMP_MTX
kernel, which is based on the following principle: all the pipe operations are
to be performed in a single critical region. An obvious way to achieve this is
to use a giant lock to enclose all the pipe operations in a single critical
region. For efficiency, such a lock should be a spinlock. However, once we have
such a giant lock, the problem changes completely. First, using semaphores
inside a common critical region becomes superficial. It would be much more
efficient to operate on the pipe variables directly. Second, semaphores are
sutiable for read/write data objects of the same size, e.g. a byte, but not for
a variable number of bytes as in the case of read/write pipes. For these reasons
we shall not use semaphores but the modified sleep()/wakeup() on steriods to
implement pipes for SMP. The algorithms of pipe operations in the SMP_MTX kernel
are as follows.
(1). The PIPE structure:
struct pipe{
char buf[PSIZE]; // circular buf[PSIZE], index head for
int head, tail; // write char, index tail for read char
int data, room; // numbers of data and room in buf[ ]
int nreader, nwriter; // number of readers, writers on pipe
int spin; // spinlock = 0
int busy; // pipe status: FREE or in use
} pipe[NPIPES]; // global pipe[NPIPES] in t.c file
(2). kpipe(int pd[2]):
allocate a FREE pipe and 2 Open File Tables (OFTs), one for READ_PIPE,
another for WRITE_PIPE. Initialize the pipe struct and allocate 2 file
descriptors for pd[2].
(3). int close_pipe(int fd)
{
from fd, get OFT and pipe pointer p;
slock(&p->spin); // lock pipe;
if (OFT is READ_PIPE){
if (--p->nreader==0){
free OFT;
if (p->nwriter==0) free pipe;
}
wakeup(&p->room); // wakeup all writers;
}
if (OFT is WRITE_PIPE){
if (--p->nwriter==0){
free OFT;
if (p->nreader==0) free pipe;
}
wakeup(&p->data); // wakeup all readers;
}
free file descriptor fd;
sunlock(&p->spin); // unlock pipe
return OK;
}
(4). int read_pipe(int fd, char *buf, int n)
{
if (n<=0) return 0;
validate fd; from fd, get OFT and pipe pointer p;
slock(&p->spin); // lock pipe;
int r = 0;
while(p->data && n){
read a byte form pipe to buf in Umode;
n--; r++; p->data--; p->room++;
}
if (n==0 || r){ // has read some data
wakeup(&p->room); // wakeup writers
sunlock(&p->spin); // unlock pipe
return r;
}
// pipe has no data
if (p->nwriter){ // if pipe still has writer
wakeup(&p->room); // wakeup writers
sleep(&p->data, &p->spin); // sleep for data, then unlock pipe
continue;
}
// pipe has no writer and no data
sunlock(&p->spin); // unlock pipe
return 0;
}
(5). int write_pipe(int fd, char *buf, int n)
{
if (n<=0) return 0;
validate fd; from fd, get OFT and pipe pointer p;
int r = 0;
while (n){
slock(&p->spin); // lock pipe
if (p->nreader == 0){ // no more readers
sunlock(&p->spin); kexit(BROKEN_PIPE);
}
while(p->room && n){
get a byte from buf in Umode, write to pipe;
r++; p->data++; p->room--; n--;
}
wakeup(&p->data); // wakeup readers
if (n==0){ // finished writing n bytes
sunlock(&p->spin) // unlock pipe
return r;
}
// still has data to write but pipe has no room
sleep(&p->room, &p->spin); // sleep for room, then unlock pipe
}
}
In terms of parallel programming, pipes represent an extreme case in that
the problem cannot be easily parallelized. This is because pipe data are FIFO.
It is therefore not possible to decompose a pipe's data buffer into separate
pieces to support concurrency. In contrast, there are many other cases in which
the problem can, and should be, parallelized. Algorithms designed for SMP
operations should strive for both improved concurrency and better efficiency.
4. I/O Buffer Management Problem and Algorithms
4.1. Purpose:
A set of NBUF buffers in K space is used as a cache memory between
block devices, e.g. disks, and processes doing block read/write.
The goal is to reduce the number of actual I/O operations.
Basic principle:
When a process wants to read from (dev,blk), it first searches the
buffer cache for a buffer assigned to this (dev,blk). If such a
buffer exists with valid data, it simply reads data from the buffer
without incurring any I/O operation. If such a buffer does not exist,
it tries to find a free buffer, assigns the buffer to (dev,blk), issues
a DiskRead() operation, waits for I/O completion, then reads data from
the buffer. Once a (dev,blk) is read in, the buffer will remain in the
buffer cache for next possible read requests for the same (dev,blk) by ANY
process..
When a process wants to write to (dev,blk), it writes to a buffer
assigned to (dev,blk). Actual writing to the device may take place
much later.
4.2. Buffer Management in Unix:
(1). A set of NBUF I/O buffers, buf[NBUF]; Each buf as 2 pointers;
freePtr -> next buffer in a freelist;
devPtr -> next buffer in a devlist;
(2). Initially, all bufs are on the av_list. Whenever a buf is assigned to a
(dev,blk), it is taken out of the freelist and inserted into the devlist.
If the buf is currently in use, it is marked BUSY, and removed from
the freelist. A BUSY buf may be in the I/O queue of a device. When a buf
is no longer BUSY, it is released back to the freelist but remains in the
devlist.
(3). getblk()/brelse() algorithms of Unix
bp = getblk(dev,blk){
loop:
(1). search devlist for a bp = (dev,blk);
(2). if (found such a bp){
if bp is BUSY:{
mark bp WANTED;
sleep on bp;
**************
goto loop;
}
/* bp not NUSY */
take bp out of freelist; mark bp BUSY;
return(bp);
} /* end found */
----------------------------------------------
(3). /* not found; try to allocate a free buf from freelist */
if (freelist empty){
mark freelist WANTED;
sleep on freelist;
************************
goto loop;
}
(4). /* at least one buf on av_list */
take first bp out of freelist;
if (this bp is for DELAYed WRITE){
write bp out ASYNC;
*******************
goto loop;
(5). mark bp BUSY; assigned bp to (dev,blk);
relink bp to (new) devlist;
return(bp);
}
brelse(bp){
if (bp is WANTed)
wakeup() ALL sleeping on bp;
if (freelist is WANTed)
wakeup() ALL sleeping on freelist;
put bp back to the (tail of ) freelist;
}
--------------------------------------------------------------------------
COMMENTS on Unix getblk()/brelse():
(1). Data Consistency:
In order to ensure data consistency, getblk() must never assign
two buffers to the same (dev,blk). =====> go to retry loop after
waking up from sleep() because what it wanted may already exist.
During a WRITE operation, data are written to a buffer, which is
marked DELWRI (Delayed Write) but remains in the buffer pool
until it is to be reassigned to a different (dev,blk).
Dirty buffers are written out before they are reassigned.
(2). Cache effect:
Cache effect is achieved mainly by:
brelse(bp) puts bp back to the (tail of) freelist but let it remain in
the devlist and retain its (dev,blk) identity until it is grabed for
reassignment.
Once a bp is assigned to a specific (dev,blk), all efforts are
made to prolong its life, e.g. by
Delayed Write, and relesaing to the tail, but grabbing from the
front, of the freelist. (LeastRecentlyUsed principle).
(3). Critical Regions:
Disk interrupt handlers may manipulate the freelist, e.g. dequeue a bp
from a device IO-queue, change its status and call brelse(bp).
So, in getblk()/brelse(), disk interrupts are masked out in these
critical regions.
(4). Shortcomings of the algorithm:
1. Inefficiency: the algorithm relies on re-try loops after sleep()/
wakeup().
2. No concurrent reads (for multiprocessor kernel).
3. Possible starvation.
4. Use sleep()/wakeup(), good only for Uniprocessor kernel.
4.3. Simple PV algorithm
First, we define the following semaphores.
BUFFER buf[NBUF}; // NBUF I/O buffers
SEMAPHORE free = NBUF; // a counting semaphore for FREE buffers
SEMAPHORE buf[i].sem = 1; // each buffer has a lock sem=1;
For convenience, we shall refer to the semaphore of each buffer by the buffer
itself. As in the Unix algorithm, initially all buffers are in the freelist and
all device lists and I/O queues are empty. Most students tend to underestimate
the problem and may come up with the following algorithm.
BUFFER *getblk(dev, blk)
{
while(1){
(1). search dev_list for bp=(dev,blk);
(2). if (bp in dev_list){
(3). P(bp);
remove bp from freelist;
return bp;
}
// bp not in cache, create a bp=(dev,blk)
(4). P(free); // get a free buffer;
(5). get a buffer from freelist;
assign buffer to (dev,blk);
return bp;
}
}
brelse(BUFFER *bp)
{
(6). enter bp into freelist;
V(bp);
V(free);
}
Unfortunately, such an algorithm is incorrect. To see this, assume that several
processes need the same buffer, which does not exist, and there are no more free
buffers. Then, all of them would be blocked at (4). When buffers are released as
free at (6), these processes would wake up to create the same buffer multiple
times. To prevent multiple buffers, such processes would have to execute from
(1) again, which amounts to retry. In addition, such an algorithm also has other
race conditions.
The following shows a CORRECT simple PV-algorithm for buffer management.
BUFFER *getblk(dev, blk)
{
while(1){
(1). P(free); // get a free buffer fisrt
search dev_list for bp=(dev,blk);
(2). if (find such a bp){
(3). if (bp not BUSY){
remove bp from freelist;
P(bp); // lock bp but does not wait
return bp;
}
// bp in cache but BUSY
V(free); // give up the free buffer
(4). P(bp); // wait in bp queue
return bp;
}
// bp not in cache, try to create a bp=(dev,blk)
(5). bp = frist buffer taken out of freelist;
P(bp); // lock bp, no wait
(6). if (bp dirty){
awrite(bp);
continue; // continue to (1)
}
(7). reassign bp to (dev,blk); // mark bp data invalid, not dirty
return bp;
}
}
brelse(BUFFER *bp)
{
(8). if (bp queue has waiter)
V(bp);
else if (bp dirty && free queue has waiter)
(9). awrite(bp);
else{
(10). enter bp into (tail of) freelist;
V(bp);
V(free);
}
}
Next, we show that the simple PV-algorithm meets all the design requirements.
(1). Assigned buffers are unique:
In getblk(), if there are free buffers, the process does not wait at (1).
Then it searches the dev_list. If the needed buffer already exits, the
process does not create the same buffer again. If the needed buffer does
not exist, the process creates the needed buffer by using a free buffer,
which is guaranteed to have. Once the needed buffer is created, it will be
in the dev_list so that no other process will create it again. If there are
no free buffers, it is possible for several processes, which need the same
buffer, to be blocked at (1). When a free buffer is released at (10), it
"wakes up" only one process blocked at (1), which allows only one process
to create the needed buffer. Therefore, multiple buffers cannot occur and
every assigned buffer is unique.
(2). No retry loops:
The only place a process re-executes the while(1) loop is at (6), but that
is not a retry loop because the process is continually executing.
(3). No "unnecessary wakeups" of processes:
In getblk(), a process may "wait" either for a free buffer at (1) or for
the needed buffer at (4). In either case, the process is not "woken up" to
run again until it has a buffer. Furthermore, at (9), when a dirty buffer
is to be released as free but there are waiters for free buffers at (1),
the buffer is not released but written out directly. This avoids an
unnecessary process wakeup. The reader is encouraged to figure out why?
(4). No race conditions:
This is because only one process can run at a time. The situation in which
a buffer is intended for one process but grabed by another process due to
their orders of execution cannot occur.
(5). Cache effect:
Unlike the Unix algorithm, in which a released buffer is up for grabs, in
the PV-algorithm, a buffer with waiters is always kept for resue. Thus, a
buffer that is still in demand is never reassigned. A buffer is released as
free only if it has no waiters. This should enhance the buffer's cache
effect.
(6). No starvation for free buffers.
In getblk(), if there are no free buffers, all requesting processes will
be blocked by P(free) at (1). This implies that while there are processes
waiting for free buffer, all buffers in use cannot admit any new users.
This guarantees that a BUSY buffer will eventually be released as free.
Therefore, starvation for free buffers cannot occur.
The simple PV-algorithm works fine and is easy to implement. However, it
does have the following two weaknesses. First, its cache effect may not be
optimal. This is because as soon as there is no free buffer, all new requsting
processes will be blocked at P(free), even if their needed buffer may already
exist in the cache. Second, when a process "wakeup" from free.queue, it may find
the needed buffer already exists but is BUSY, in which case it will be blocked
again at (4) by P(bp). Strickly speaking, the process has been "woken up"
unnecessarily since it gets blocked twice. I do have another semaphore based
OPTIMAL PV-algorithm, which does not have such weaknesses.
4-4. Simple PV_algorithm for SMP
Define SPINLOCK sbuf=0; SEMAPHORE free=NBUF; each buffer has a lock semaphore=1
struct buf *getblk(int dev, int blk)
{
while(1){
(1). P(free); // get a free buf
(2). slock(sbuf); // acquire the spinlock
(3). if (bp in cache){
if (bp is locked){
V(free); // give up the free buf
V(sbuf); // MUST release mbuf lock first
<<==================== time GAP: things can happen to bp
P(bp); // wait for bp
if (bp changed){
V(bp); // if bp changed, give up bp and retry
continue;
}
return bp;
}
// bp in cache and is FREE
out_freelist(bp); // remove bp from freelist
P(bp); // lock bp
sunlock(sbuf); // release spinlock
return bp;
}
// bp not in cache; already has a free buf in hand
bp = dequeue(freelist);
P(bp); // lock the buffer
if (bp DIRTY){ // delayed write buf, can't use it
awrite(bp);
sunlock(sbuf);
continue; // continue while(1) loop
}
// bp is a new buffer; reassign it to (dev,blk)
reassign bp to (dev,blk); // bp may change dev list
sunlock(sbuf);
return bp;
}
}
where PV(s, spin) aotmically blocks the process on semaphore s and releases the
spinlock, similar to sleep(event, spinlock).
int brelse(struct buf *bp)
{
(1). slock(sbuf); // acquire spinlock
if (bp has waiter){ // bp has waiter
V(bp);
sunlock(sbuf); // release spinlock
return;
}
(2). if (bp DIRTY && freeQ){
awrite(bp); // write bp ASYNC
sunlock(sbuf); // release spinlock
return;
}
slock(sfree);
enter_freelist(bp); // release bp as FREE
sunlock(sfree);
V(bp); V(free);
sunlock(sbuf); // release spinlock
}
In the modified PV_algorithm, since both getblk() and brelse() are executed in
the same critical region of a spinlock, the buffer data strucutres do not need
any additional protection. The modified PV_algorithm is therefore a trivial
extension of the UP version. The algorithm works for SMP but it has a major
drawback in that it does not allow for any concurrency. If we examine the I/O
buffer management problem closely, we should see that it differs from pipes in
an important aspect. In the I/O buffer management case, buffers are maintained
in separate data strucutres, such as device lists and freelist. These separate
data structures lend themselves naturally to parallel operations.
----------------------------------------------------------------------------