Purpose: Parallel algorithms in SMP; Preliminaries for Course Project.

0. Review of Critical Region (CR), atomic instruction, spinlock and semaphores.

(1). CR = operations on shared data object which can only be accessed by one 
        process at a time.
   
(2). Atomic instruction and spinlock:
  
! ------- implementation of slock(&x)/sunlock(&x) -----------------
slock:
       pushl %ebp
       movl  %esp,%ebp
       pushl %ebx          # save ebx
       movl  8(%ebp),%ebx  # pointer to spin lock x
spin:  movl  $1,%eax
       xchg  (%ebx),%eax   # ATOMIC: get x and set x=1
       bt    $0,$eax
       jc    spin          # spin if x was already 1
       popl  %ebx          # return only if x was 0
       popl  %ebp
       ret

! sunlock(&x)
sunlock:
       pushl %ebp
       movl  %esp,%ebp
       pushl %ebx
       movl  8(%ebp),%ebx  # pointer to spinlock x
       xorl  %eax,%eax     # eax=0
       xchg  (%ebx),%eax   # ATOMIC: set x=0
       popl  %ebx
       popl  %ebp
       ret

(3). Semaphore and P/V Operations 

typedef struct semaphore{
  u32 lock;                 // per-semaphore spin lock; initialized to 0
  int value;
  struct proc *queue;
}SEMAPHORE;

where the per-semaphore spin lock is to ensure that every operation on a 
semaphore is a critical region. The P and V operations are implemented as

int P(struct semaphore *s)
{
  PROC *p; int ps;
  ps=int_off();             // disable CPU interrupts 
  slock(&s->lock);          // acquire the semaphore spin lock
    s->value--;
    if (s->value < 0){
       running->status=BLOCK;
       running->sem = s;    // for un-do P operation on semaphore    
       enqueue(&s->queue, running);
       sunlock(&s->lock);   // release spin lock
       tswitch();           // give up CPU
    }
    else
       sunlock(&s->lock);   // release spin lock
  int_on(ps);               // restore CPU interrupt mask
}

int V(struct semaphore *s)
{
  PROC *p; int ps;
  ps=int_off();            // disable CPU interrupts
  slock(&s->lock);         // spin lock
    s->value++;
    if (s->value <= 0){
        p = dequeue(&s->queue);
        p->sem = 0;
        p->status = READY;
        schedule(p);
    }
 sunlock(&s->lock);       // release spin lock
 int_on(ps);              // restore CPU interrupt mask
}

Similarly, we may define and implement ANY OTHER operation on a semaphore, 
provided that it operates in the CR of the semaphore. Examples:

   Conditional_P():  int CP(*s){
                         if (s->value > 0){
                            P(s); return 1;
                         }
                         return 0;
                     }
   Conditional_V():  in CP(s){
                        if (s->value < 0)
                            V(s); 
                     }

   int value(s){  return s->value; }

   etc.
 
1. Review of the Producer-Consumer Problem

   A set of Producer processes share a finite set of buffers with another set
   of Consumer processes. They operate as follows.

   Initially, all buffer cells are empty. When a Producer puts an item 
   into an empty cell, the cell becomes full. When a Consumer gets an 
   item from a full cell, that cell becomes empty, etc. Naturally, each 
   cell can only contain ONE item at a time. A producer must WAIT if there are
   no empty cells. Similarly, a Consumer must WAIT if there are no full 
   cells. Furthermore, WAITing processes should be allowed to continue when 
   their awaited events occur. Figure 6- shows a solution to the Producer-
   Consumer problem, in which mutex semaphores are for processes to access
   the circular buffer as CRs and producers and consumers cooperate by
   the full and empty semaphores.

          DATA buf[N];              /* N buffer cells  */
          int  head=tail=0;         /* index to buffer cells */
          SEMAPHORES  empty = N; full = 0; pmutex = 1; cmutex = 1;  
                          
         Producer:                          Consumer:
    ----------------------------------------------------------------
    while (1){                 |    while(1){           
       produce an item;        |         ...................;
       P(empty);               |         P(full);
        P(pmutex);             |           P(cmutex);
          buf[head++] = item;  |              item = buf[tail++];
          head %= N;           |              tail %= N;
        V(pmutex);             |           V(cmutex);
       V(full);                |         V(empty);
       ................;       |         /* consume item */
     }                         |    {
    ----------------------------------------------------------------
                 Producer-Consumer Problem Solution

2. Pipes

   Pipes are unidirectional interprocess communication channels for processes
to exchange data. A pipe has a read end and a write end. Data written to the
write end of a pipe can be read from the read end of the pipe. Since their 
debute in the original Unix, pipes have been incooperated into most OS, with 
many variations. Some systems allow pipes to be bidirectional, in which data can
be transmitted in both directions. Ordinary pipes are for related processes.
Named pipes are FIFO communication channels between unrelated processes. Reading
and writing pipes are usually synchronous and blocking. Some systems also 
support nonblocking and asynchronous read/write operations on pipes. 

For simplicity, we shall consider a pipe as a finite-sized FIFO communication 
channel between a set of related processes. Reader and writer processes of a 
pipe are synchronized in the following manner. When a reader tries to read from
a pipe, if the pipe has data, the reader reads as much as it needs (up to the 
pipe size) and returns the number of bytes read. If the pipe has no data but 
there are still writers, the reader "waits" for data. When a writer writes data
to a pipe with waiting readers, it "wakeup" the waiting readers, allowing them 
to continue. If the pipe has no data and also no writer, the reader returns 0. 
Since readers wait for data if the pipe still has writers, the 0 return value 
means only one thing: the pipe has no more data and also no writer. In that 
case, the reader can decide what to do next. When a writer writes to a pipe, if
the pipe has room, the writer writes as much as it needs to or until the pipe is
"full", i.e. no more room. If the pipe has no room but still has readers, the 
writer "waits" for room. When a reader reads data from the pipe to create more 
rooms, it "wakeup" any waiting writers, allowing them to continue. Howeveer, 
when a writer tries to write to a pipe that has no reader, the writer detects 
this as a "broken pipe" error and  aborts.

6.13.2. Pipe Programming in Unix/Linux

    Although a pipe is capable of supporting multiple wrtier-reader processes, 
the most common usage is to connect a pair of writer-reader processes through 
each pipe. For example, the Unix command
           cat filename | grep pattern
connects a process executing cat and another process executing grep by a pipe.
In Unix, pipes are supported by a set of pipe related syscalls. The syscall 
pipe(int pd[2]) creates a pipe in the kernel and returns two file descriptors 
in pd[2], where pd[0] is for reading from the pipe and pd[1] is for writing to 
the pipe. However, a pipe is not intended for a single process. For example, 
after creating a pipe, if the process tries to read even 1 byte from the pipe, 
as in
         char c;  read(pd[0], &c, 1);
the process will never return from the read syscall. This is because when the 
process tries to read from the pipe, there is no data but the pipe has a writer,
so it "waits" for data. But who is the writer? The process itself. So the
process waits for itself, thereby locking itself out so to speak. Conversely, if
the process tries to write more than the pipe size (4KB in most cases), the 
process would again waits for itself when the pipe becomes full. Thus, while 
any process may create a pipe and get two file descriptors, a process can only 
be either a reader or a writer on a pipe, but not both. The correct usage of 
pipe is as follows. After creating a pipe, the process forks a child process. 
During forking, the child inherits all the opened file descriptors of the 
parent. So both the parent and child have the same read/write pipe diescriptors.
The user must designate one of the processes as a writer and the other as a 
reader of the pipe. The order does not matter as long as each process is 
designated to play only a single role. Assume that the parent is chosen as the 
writer and the child as the reader. Then each process must close its unwanted
pipe descriptor, i.e. the witer must close its pd[0] and the reader must close 
its pd[1]. Then the parent can write to the pipe and the child can read from the
pipe. The following figure shows the system model of pipe operations. 

  ---------------------------- Unix kerenl -------------------------------------
   writerProc                                              readerProc
   fd[pd[1]] ----> writeOFT ---> PIPE ----> readOFT ------> fd[pd[0]]
      ^                                                        |                
 ---- | ------------------------------------------------------ | --------------
      |                           |                            v
  int n = write(pd[1],wbuf,nbytes)|      int n = read(pd[0],rbuf,nbytes)
                                  |
     int pd[x|1]; char wbuf[ ]    |        int pd[0|x]; char rbuf[ ] 
                                  |
      Writer process Uimage       |         reader process Uimage
 ---------------------------------------------------------------------------

  In the figure, a writer process issues a write(pd[1], wbuf, nbytes) syscall 
to enter the OS kernel. It uses the file descriptor pd[1] to access the PIPE 
through the writeOFT. It execute write_pipe() to write data into the PIPE's 
buffer, waiting for room if necessary. On the right-hand side of the figure, a
reader process issues a read(pd[0],rbuf,nbytes) to enter the OS kernel. It uses
the file descriptor pd[0] to access the PIPE through the readOFT. Then it 
execute read_pipe() to read data from the PIPE's buffer, waiting for data if 
necessary. The writer process may terminate first when it has no more data to 
write, in which case the reader can continue to read as long as the PIPE still 
has data. However, if the reader terminates first, the writer should see a 
broken pipe error and also terminate. Note that the broken pipe condition is 
not symmetrical. It is a condition of a communiction channel on which there are
writers but no readers. The reverse is not a broken pipe. The follwoing program
demonstrates pipes in Unix/Linux.

/********************** Unix Pipe Example ***************************/
#include < stdio.h>
#include < stdlib.h>
#include < string.h>

int  pd[2], n, i;
char line[256], *s="data from pipe ";

main()
{
   pipe(pd);     // create a pipe
   if (fork()){  // fork a child as READER, parent as WRITER
       printf("parent %d close pd[0]\n", getpid());
       close(pd[0]);
       while(i++ < 10){   // parent writes to pipe 
           printf("parent %d writing pipe : %s\n", getpid(), s);
           write(pd[1], s, strlen(s));
       }
       printf("parent %d exit\n", getpid());
       exit(0);
   }
   else{         // child as pipe READER
       printf("child  %d close pd[1]\n", getpid());
       close(pd[1]);
       while(1){  // child read from pipe
            printf("child  %d reading from pipe\n", getpid());
            if ((n = read(pd[0], line, 256)) == 0)
                exit(0);      
            line[n]=0; printf("%s  n=%d\n",line,  n);
       }
   }
}

Run the program under Linux and observe its behavior with the following 
experiments.

   (1). Let the parent be the reader and child be the writer.
   (2). Let the writer write continuously and the reader only read 10 times.

   In case (1), the sh prompt will reappear only when the reader dies. In case
(2), the writer will die by a broken pipe signal SIGPIPE=13 as soon as the 
reader exits.

3. Pipes and the Producer-Consumer Problem

    In principle, pipes are similar to the producer-consumer problem, but there
are differences. In the producer-consumer problem, processes run forever. They 
are synchronized beautifully as long as both producer and consumer processes 
exist. However, in the real world processes do not behave that way. For 
instance, what if all the consumers have died (abnormally, perhaps)? Should
producers still write items to the shared buffer? If they do, then what for? 
since there is no one to consume the items anymore. Worse yet, if producers 
continue to write, eventually all of them will be blocked when the buffer is 
full, a degenerated form of deadlock. On the consumer side, what if all the 
producers have died? Should they still wait for items that will never come? If 
so, they will also end up in a degenerated form of deadlock. In the idealistic
producer-consumer problem, such real issues are totally ignored. The solution 
may be very elegant but has little practical value. However, if we amend the 
producer-consumer problem to make it applicable to the real word, the result is
a pipe. 

Assume that a pipe has a char buffer of size PSIZE, with the following variables
and semaphores for process synchronization.

     int nreader = number of readers, nwriter = number of writers;
     SEMAPHORE data = number of chars in the pipe buffer, initial value=0;
     SEMAPHORE room = number of space in the pipe buffer, initial value=PSIZE;
     SEMAPHORE rmutex = wmutex = locking sempahores, initial value=1.

Then, we can express pipe write/read algorithms as follows.

   --------------------------------------------------------------------------
   int write_pipe(int n)           |   int read_pipe(int n)
   {                               |   { 
     int r = n;                    |     int r = 0;
     while (n){                    |     while(n){
  W1:  if (pipe has no reader)     |  R1:  if (pipe has no writer && no data)
           exit(BROKEN_PIPE);      |           return r;
  W2:  P(room);                    |  R2:  P(data);
  W3:  if (pipe has no reader)     |  R3:  if (pipe has no writer && no data)
           exit(BROKEN_PIPE);      |           return r;
       P(wmutex);                  |       P(rmutex);
         write a char to pipe;     |         read a char from pipe;
         n--;                      |         n--; r++; 
       V(wmutex);                  |       V(rmutex);
       V(data);                    |       V(room);
     }                             |     }    
     return r;                     |     return r;
   }                               |   } 
   --------------------------------------------------------------------------

The algorithms are exactly the same as that of the producer-consumer problem, 
except for the tests at W1, W3, R1 and R3. In write_pipe(), a writer must check
whether the pipe still has any reader. If not, it must abort with a BROKEN_PIPE
error before doing P(room). Otherwise, it may block at W2 forever. When a writer
that was blocked at W2 resumes, it must check for BROKEN_PIPE error again since
the readers may have terminated. Similarly, in read_pipe(), a reader must check 
whether the pipe has any writer and data before doing P(data). Otherwise, it
may wait for data that will never come. When a reader that was blocked at R2 
resumes, it cannot assume that there are data in the pipe, as in the producer-
consnumer problem, becasue the writers may have terminated. The above algorithms
work if and only if we can perform the tests at W1, W3, R1 and R3 reliably. With
the pipe variables nreader and nwriter, the problem seems to be trivial. For 
example, at W1, we can test the value of nreader. At R1, we can test the values
of both nwriter and the semaphore data. Unfortunately, such tests work only in 
UP kernels but not in SMP. 

In a SMP environment, testing the value of a semaphore is of no use, even if it
is performed in a critical region, since the semaphore value may be changed by 
another process (on different CPU) immediately after the test. Therefore, any 
decision based on such tests would be unreliable. For example, if a pipe reader
has just checked that there are still writers but no data, so it intends to wait
for data. But before it completes the "WAIT" operation, all the writers (on 
other CPUs) may have died. If so, the reader process would follow the incorrect
decision to wait forever. Similarly, if a writer process has just checked that 
there are still readers and it intends to "WAIT" for room in the pipe. But 
before it completes the "WAIT" operation, all the readers (on other CPUs) may 
have died. If so, the writer would be blocked forever also. 


In the following, we shall show the implementation of pipes in the SMP_MTX 
kernel, which is based on the following principle: all the pipe operations are 
to be performed in a single critical region. An obvious way to achieve this is 
to use a giant lock to enclose all the pipe operations in a single critical 
region. For efficiency, such a lock should be a spinlock. However, once we have
such a giant lock, the problem changes completely. First, using semaphores 
inside a common critical region becomes superficial. It would be much more 
efficient to operate on the pipe variables directly. Second, semaphores are 
sutiable for read/write data objects of the same size, e.g. a byte, but not for
a variable number of bytes as in the case of read/write pipes. For these reasons
we shall not use semaphores but the modified sleep()/wakeup() on steriods to 
implement pipes for SMP. The algorithms of pipe operations in the SMP_MTX kernel
are as follows. 

(1). The PIPE structure:
     struct pipe{
             char  buf[PSIZE];       // circular buf[PSIZE], index head for 
             int   head, tail;       // write char, index tail for read char
             int   data, room;       // numbers of data and room in buf[ ]
             int   nreader, nwriter; // number of readers, writers on pipe
             int   spin;             // spinlock = 0
             int   busy;             // pipe status: FREE or in use
     } pipe[NPIPES];                 // global pipe[NPIPES] in t.c file

(2). kpipe(int pd[2]):

     allocate a FREE pipe and 2 Open File Tables (OFTs), one for READ_PIPE, 
     another for WRITE_PIPE. Initialize the pipe struct and allocate 2 file 
     descriptors for pd[2]. 

(3). int close_pipe(int fd)
     {
        from fd, get OFT and pipe pointer p;
        slock(&p->spin);      // lock pipe;  
        if (OFT is READ_PIPE){
            if (--p->nreader==0){ 
               free OFT;
               if (p->nwriter==0) free pipe;
            }
            wakeup(&p->room);  // wakeup all writers;
         }
         if (OFT is WRITE_PIPE){
            if (--p->nwriter==0){ 
               free OFT;
               if (p->nreader==0) free pipe;
            }
            wakeup(&p->data);  // wakeup all readers;
         } 
         free file descriptor fd;      
         sunlock(&p->spin);    // unlock pipe
         return OK;
     }

(4). int read_pipe(int fd, char *buf, int n)
     {
        if (n<=0) return 0;
        validate fd; from fd, get OFT and pipe pointer p;
        slock(&p->spin);      // lock pipe;
        int r = 0;
        while(p->data && n){
             read a byte form pipe to buf in Umode;
             n--; r++; p->data--; p->room++;
        }
        if (n==0 || r){                 // has read some data
           wakeup(&p->room);            // wakeup writers
           sunlock(&p->spin);           // unlock pipe
           return r;
        }
        // pipe has no data
        if (p->nwriter){                // if pipe still has writer
           wakeup(&p->room);            // wakeup writers
           sleep(&p->data, &p->spin);   // sleep for data, then unlock pipe
           continue;
        }
        // pipe has no writer and no data
        sunlock(&p->spin);              // unlock pipe
        return 0;
     }

(5). int write_pipe(int fd, char *buf, int n)
     {
        if (n<=0) return 0;
        validate fd; from fd, get OFT and pipe pointer p;
        int r = 0;
        while (n){
           slock(&p->spin);            // lock pipe
           if (p->nreader == 0){       // no more readers
               sunlock(&p->spin); kexit(BROKEN_PIPE);
           }
           while(p->room && n){
               get a byte from buf in Umode, write to pipe;
               r++; p->data++; p->room--; n--;
           }
           wakeup(&p->data);          // wakeup readers
           if (n==0){                 // finished writing n bytes 
              sunlock(&p->spin)       // unlock pipe
              return r;
           }
           // still has data to write but pipe has no room
           sleep(&p->room, &p->spin); // sleep for room, then unlock pipe
        }
     }

    In terms of parallel programming, pipes represent an extreme case in that
the problem cannot be easily parallelized. This is because pipe data are FIFO. 
It is therefore not possible to decompose a pipe's data buffer into separate 
pieces to support concurrency. In contrast, there are many other cases in which
the problem can, and should be, parallelized. Algorithms designed for SMP 
operations should strive for both improved concurrency and better efficiency.

4. I/O Buffer Management Problem and Algorithms

4.1. Purpose:

   A set of NBUF buffers in K space is used as a cache memory between
   block devices, e.g. disks, and processes doing block read/write.
   The goal is to reduce the number of actual I/O operations.

                       Basic principle:

   When a process wants to read from (dev,blk), it first searches the
   buffer cache for a buffer assigned to this (dev,blk).  If such a
   buffer exists with valid data, it simply reads data from the buffer
   without incurring any I/O operation.  If such a buffer does not exist,
   it tries to find a free buffer, assigns the buffer to (dev,blk), issues 
   a DiskRead() operation, waits for I/O completion, then reads data from 
   the buffer.  Once a (dev,blk) is read in, the buffer will remain in the 
   buffer cache for next possible read requests for the same (dev,blk) by ANY
   process..

   When a process wants to write to (dev,blk), it writes to a buffer 
   assigned to (dev,blk).  Actual writing to the device may take place 
   much later.

4.2. Buffer Management in Unix:

(1). A set of NBUF I/O buffers, buf[NBUF]; Each buf as 2 pointers;
       freePtr -> next buffer in a freelist;
       devPtr  -> next buffer in a devlist;

(2). Initially, all bufs are on the av_list. Whenever a buf is assigned to a 
     (dev,blk), it is taken out of the freelist and inserted into the devlist.
     If the buf is currently in use, it is marked BUSY, and removed from
     the freelist.  A BUSY buf may be in the I/O queue of a device. When a buf
     is no longer BUSY, it is released back to the freelist but remains in the
     devlist.

(3). getblk()/brelse() algorithms of Unix

     bp = getblk(dev,blk){
        loop:
        (1). search devlist for a bp = (dev,blk);
        (2). if (found such a bp){
                if bp is BUSY:{
                   mark bp WANTED; 
                   sleep on bp;
                   ************** 
                  goto loop;
                }
                /* bp not NUSY */
                take bp out of freelist; mark bp BUSY;
                return(bp); 
              } /* end found */
         ----------------------------------------------
        (3). /* not found; try to allocate a free buf from freelist */
             if (freelist empty){
                 mark freelist WANTED;
                 sleep on freelist;
                ************************
                 goto loop;
             }
        (4). /* at least one buf on av_list */
             take first bp out of freelist; 
             if (this bp is for DELAYed WRITE){ 
                 write bp out ASYNC;
                 *******************
                 goto loop; 


        (5). mark bp BUSY; assigned bp to (dev,blk);
             relink bp to (new) devlist;
             return(bp);
     }

                             
     brelse(bp){
       if (bp is WANTed)
           wakeup() ALL sleeping on bp;
       if (freelist is WANTed)
           wakeup() ALL sleeping on freelist;
       put bp back to the (tail of ) freelist;
     }
--------------------------------------------------------------------------
              COMMENTS on Unix getblk()/brelse():

(1).  Data Consistency:
      In order to ensure data consistency, getblk() must never assign
      two buffers to the same (dev,blk).  =====> go to retry loop after 
      waking up from sleep() because what it wanted may already exist.
     
      During a WRITE operation, data are written to a buffer, which is
      marked DELWRI (Delayed Write) but remains in the buffer pool 
      until it is to be reassigned to a different (dev,blk). 

      Dirty buffers are written out before they are reassigned.

(2). Cache effect:
      Cache effect is achieved mainly by:
      brelse(bp) puts bp back to the (tail of) freelist but let it remain in 
      the devlist and retain its (dev,blk) identity until it is grabed for 
      reassignment.

      Once a bp is assigned to a specific (dev,blk), all efforts are 
      made to prolong its life, e.g. by 
        Delayed Write, and relesaing to the tail, but grabbing from the
        front, of the freelist. (LeastRecentlyUsed principle).

(3). Critical Regions:
      Disk interrupt handlers may manipulate the freelist, e.g. dequeue a bp 
      from a device IO-queue, change its status and call brelse(bp).
      So, in getblk()/brelse(), disk interrupts are masked out in these
      critical regions.

(4). Shortcomings of the algorithm:

     1. Inefficiency: the algorithm relies on re-try loops after sleep()/
        wakeup().
    
     2. No concurrent reads (for multiprocessor kernel).

     3. Possible starvation.

     4. Use sleep()/wakeup(), good only for Uniprocessor kernel.


4.3. Simple PV algorithm
     First, we define the following semaphores.

          BUFFER buf[NBUF};          // NBUF I/O buffers
          SEMAPHORE free = NBUF;     // a counting semaphore for FREE buffers
          SEMAPHORE buf[i].sem = 1;  // each buffer has a lock sem=1;

   For convenience, we shall refer to the semaphore of each buffer by the buffer
itself. As in the Unix algorithm, initially all buffers are in the freelist and
all device lists and I/O queues are empty. Most students tend to underestimate 
the problem and may come up with the following algorithm.

   BUFFER *getblk(dev, blk)
   {
       while(1){
    (1).   search dev_list for bp=(dev,blk);
    (2).   if (bp in dev_list){
    (3).      P(bp);
              remove bp from freelist;
              return bp;
           }
           // bp not in cache, create a bp=(dev,blk)
    (4).   P(free);     // get a free buffer;
    (5).   get a buffer from freelist;
           assign buffer to (dev,blk);
           return bp;
       }
   }        
 
   brelse(BUFFER *bp)
   {
    (6).   enter bp into freelist;
           V(bp);
           V(free);
   }

Unfortunately, such an algorithm is incorrect. To see this, assume that several
processes need the same buffer, which does not exist, and there are no more free
buffers. Then, all of them would be blocked at (4). When buffers are released as
free at (6), these processes would wake up to create the same buffer multiple 
times. To prevent multiple buffers, such processes would have to execute from 
(1) again, which amounts to retry. In addition, such an algorithm also has other
race conditions.

  The following shows a CORRECT simple PV-algorithm for buffer management.
   
   BUFFER *getblk(dev, blk)
   {
       while(1){
    (1).   P(free);              // get a free buffer fisrt
           search dev_list for bp=(dev,blk);
    (2).   if (find such a bp){
    (3).         if (bp not BUSY){
                 remove bp from freelist;
                 P(bp);          // lock bp but does not wait     
                 return bp;
               }
               // bp in cache but BUSY
               V(free);         // give up the free buffer
    (4).       P(bp);           // wait in bp queue
               return bp;
           }
           // bp not in cache, try to create a bp=(dev,blk)
    (5).   bp = frist buffer taken out of freelist;
           P(bp);               // lock bp, no wait
    (6).   if (bp dirty){
              awrite(bp);
              continue;         // continue to (1)
           }
    (7).   reassign bp to (dev,blk); // mark bp data invalid, not dirty
           return bp;
        }
   }        
 
   brelse(BUFFER *bp)
   {
    (8). if (bp queue has waiter)
            V(bp);
         else if (bp dirty && free queue has waiter)
    (9).         awrite(bp);
         else{
    (10).    enter bp into (tail of) freelist;
             V(bp);
             V(free);
         }
   }

   Next, we show that the simple PV-algorithm meets all the design requirements.

(1). Assigned buffers are unique:
     In getblk(), if there are free buffers, the process does not wait at (1). 
     Then it searches the dev_list. If the needed buffer already exits, the 
     process does not create the same buffer again. If the needed buffer does 
     not exist, the process creates the needed buffer by using a free buffer, 
     which is guaranteed to have. Once the needed buffer is created, it will be
     in the dev_list so that no other process will create it again. If there are
     no free buffers, it is possible for several processes, which need the same
     buffer, to be blocked at (1). When a free buffer is released at (10), it 
     "wakes up" only one process blocked at (1), which allows only one process 
     to create the needed buffer. Therefore, multiple buffers cannot occur and 
     every assigned buffer is unique. 

(2). No retry loops: 
     The only place a process re-executes the while(1) loop is at (6), but that
     is not a retry loop because the process is continually executing.  

(3). No "unnecessary wakeups" of processes:
     In getblk(), a process may "wait" either for a free buffer at (1) or for 
     the needed buffer at (4). In either case, the process is not "woken up" to
     run again until it has a buffer. Furthermore, at (9), when a dirty buffer 
     is to be released as free but there are waiters for free buffers at (1), 
     the buffer is not released but written out directly. This avoids an 
     unnecessary process wakeup. The reader is encouraged to figure out why?
 
(4). No race conditions: 
     This is because only one process can run at a time. The situation in which
     a buffer is intended for one process but grabed by another process due to
     their orders of execution cannot occur.

(5). Cache effect:
     Unlike the Unix algorithm, in which a released buffer is up for grabs, in
     the PV-algorithm, a buffer with waiters is always kept for resue. Thus, a 
     buffer that is still in demand is never reassigned. A buffer is released as
     free only if it has no waiters. This should enhance the buffer's cache
     effect.

(6). No starvation for free buffers.
     In getblk(), if there are no free buffers, all requesting processes will 
     be blocked by P(free) at (1). This implies that while there are processes 
     waiting for free buffer, all buffers in use cannot admit any new users. 
     This guarantees that a BUSY buffer will eventually be released as free. 
     Therefore, starvation for free buffers cannot occur.

    The simple PV-algorithm works fine and is easy to implement. However, it 
does have the following two weaknesses. First, its cache effect may not be 
optimal. This is because as soon as there is no free buffer, all new requsting 
processes will be blocked at P(free), even if their needed buffer may already 
exist in the cache. Second, when a process "wakeup" from free.queue, it may find
the needed buffer already exists but is BUSY, in which case it will be blocked 
again at (4) by P(bp). Strickly speaking, the process has been "woken up" 
unnecessarily since it gets blocked twice. I do have another semaphore based
OPTIMAL PV-algorithm, which does not have such weaknesses.

4-4. Simple PV_algorithm for SMP
Define SPINLOCK sbuf=0; SEMAPHORE free=NBUF; each buffer has a lock semaphore=1

struct buf *getblk(int dev, int blk)
{     
       while(1){
  (1).    P(free);                    // get a free buf 
  (2).    slock(sbuf);                // acquire the spinlock 
  (3).    if (bp in cache){
             if (bp is locked){
                V(free);              // give up the free buf
                V(sbuf);              // MUST release mbuf lock first
                <<==================== time GAP: things can happen to bp
                P(bp);                // wait for bp
                if (bp changed){
                   V(bp);             // if bp changed, give up bp and retry  
                   continue;
                }
                return bp;
             }
             // bp in cache and is FREE
             out_freelist(bp);        // remove bp from freelist
             P(bp);                   // lock bp
             sunlock(sbuf);           // release spinlock
             return bp;
           }
           // bp not in cache; already has a free buf in hand
           bp = dequeue(freelist);    
           P(bp);                     // lock the buffer
           if (bp DIRTY){             // delayed write buf, can't use it
           	awrite(bp);
	        sunlock(sbuf); 
                continue;             // continue while(1) loop
           }
           // bp is a new buffer; reassign it to (dev,blk)
           reassign bp to (dev,blk);  // bp may change dev list
           sunlock(sbuf);
           return bp;
        }
      }  

where PV(s, spin) aotmically blocks the process on semaphore s and releases the 
spinlock, similar to sleep(event, spinlock).

int brelse(struct buf *bp)
{
 (1). slock(sbuf);                    // acquire spinlock
      if (bp has waiter){             // bp has waiter
          V(bp);                      
          sunlock(sbuf);              // release spinlock
          return;
      }
 (2). if (bp DIRTY && freeQ){        
         awrite(bp);                  // write bp ASYNC
         sunlock(sbuf);               // release spinlock
         return;
      }
      slock(sfree);
       enter_freelist(bp);             // release bp as FREE 
      sunlock(sfree);
      V(bp); V(free);
      sunlock(sbuf);                  // release spinlock
}

In the modified PV_algorithm, since both getblk() and brelse() are executed in 
the same critical region of a spinlock, the buffer data strucutres do not need 
any additional protection. The modified PV_algorithm is therefore a trivial 
extension of the UP version. The algorithm works for SMP but it has a major 
drawback in that it does not allow for any concurrency. If we examine the I/O 
buffer management problem closely, we should see that it differs from pipes in 
an important aspect. In the I/O buffer management case, buffers are maintained 
in separate data strucutres, such as device lists and freelist. These separate
data structures lend themselves naturally to parallel operations.
----------------------------------------------------------------------------