Developing Booter Programs

This section is devoted to the development of booter programs. If you are 
content with using an existing booter, the materials presented here are most
likely not of interest to you. But if you would like to know how booters work 
or intend to write your own booter, it is hoped that you will find the 
materials presented here useful. Before we start, it is worth pointing out the
unique features and requirements of booter programs.

1. A booter needs to manipulate CPU registers and make BIOS calls. Therefore,
assembly code is unavoidable. In fact, many booters, e.g. LILO, are written
entirely in assembly code. Since writing assembly programs is not easy, nor fun,
we shall use assembly code only when absolutely necessary and implement most of
the work in C.

2. When a PC starts, it is in the so-called real or unprotected mode. While
in this mode, the CPU can only execute 16-bit code and access 1 MB memory. 
To create a booter, we must use a software package that generates 16-bot code.
For example, we cannot use Linux's gcc because gcc generates 32-bit code, which
is unsuited to booting. Throughout this book, we shall use BCC under Linux as
the development platform. By default, the binary executable generated by BCC 
uses a single-segment memory model in which the code, data and stack segments 
are all the same. 

3. A booter differs from an ordinary executable program in the following 
aspects. Perhaps the most notable difference is their size. A booter's size 
(code plus static data) is usually extremely limited, e.g. 512 or 1024 bytes, 
in order to fit in one or two disk sectors. Multi-stage booters can be bigger 
but it is always desirable to keep a booter size small. When running an ordinary
program, an operating system is responsible for loading the entire program (at
least logically) into memory and setting up its execution environment before 
sending CPU to execute the program code. An ordinary program does not have to 
worry about these things. In contrast, when a booter starts, it only has the 
first 512 bytes loaded at (0x0000:7C00). If the booter is bigger than 512 bytes,
which is usually the case, it must load the remaining parts in by itself. If the
booter's initially loaded memory area is needed by the OS image, it must move 
to, and execute from, a different location to avoid being crushed by the 
incoming OS image. In addition, a booter must manage its own execution 
environment, such as setting up CPU's segment registers and a stack.

4. A booter cannot use the stardard library I/O functions, such as gets() and 
printf(), etc. because these fucntions depend on the support of an operating 
system but there is no operating system yet during booting. Therefore, a booter
must implement its own I/O fucntions by using only BIOS.

5. When developing an ordinary program, we can use a variety of tools, e.g. gdb,
for debugging. In contrast, there is almost no tool for debugging a booter. If 
something goes wrong, the machine simply stops with little or no clue as to
where and why the error occurred. This makes the development of booters much
harder than writing ordinary programs.

1a. Booter for booting MTX and Linux from FD sectors.

Our first booter is to boot MTX from a FD boot disk. The disk contents layout 
is shown in Figure  .

     | S0   | S1 S2 ...    |     unused sectors           |
     ----------------------------------------------------- 
     |booter| MTX kernel   | .............................|
          
           Figure  . MTX Boot Disk layout

It contains a booter in Sector 0, followed by an MTX kernel image in consecutive
sectors. In the MTX kernel, word 0 is a jmp instruction, word 1 is the code 
section size in 16-byte clicks and word 2 is the data section size in bytes. 
Druing booting, a booter may use these values to determine the number of sectors
of the MTX kernel to load. The loading segment address is 0x1000.

The booter consists of two files, a bs.s file in assembly and a bc.c file in C.
Under Linux, use BCC commands to generate an binary executable and dump it to 
the beginning of a floppy disk, as in

      as86 -o bs.o bs.s      # assemble bs.s into bs.o
      bcc  -c bc.c           # compile  bc.c into bc.o
      # link bs.o and bc.o into a binary executable without file header
      ld86 -d bs.o bc.o /usr/lib/bcc/libc.a 
      # dump a.out to sector 0 of a FD
      dd if=a.out of=/dev/fd0 bs=512 count=1

Instead of entering individual commands, the building process can be automated 
by using a makefile or a sh script. For simple compile-link tasks, a sh script 
is adequate and actually more convenient. For example, we can re-write the above
command as a sh script, mk, which takes a filename as parameter.

      as86 -o $1s.o $1s.s    
      bcc  -c $1.c      
      ld86 -d -o $1 $1s.o $1c.o /usr/lib/bcc/libc.a 
      dd if=$1 of=/dev/fd0 bs=512 count=1

Then, mk b1 creates b1 from b1s.s and b1c.c, mk b2 creates b2 from b2s.s and 
b2c.c, etc. In the following, we shall assume such a sh script.

The assembly code of bs.s is shown first.

!============================ bs.s file ====================================
.globl begtext, begdata, begbss   

.text                             ! code, data, bss 
begtext:                          ! sections are
.data                             ! all the same.
begdata:
.bss
begbss:
.text                             ! code section begins here
!==========================================================================
        .globl  _main,_prints,_NSEC                  ! IMPORTed from C
	.globl  _getc,_putc,_readfd,_setes           ! EXPORT to C

        BOOTSEG  =  0x9800        ! booter segment
	OSSEG    =  0x1000        ! MTX segment
        SSP      =  32*1024       ! stack size
        BSECTORS =  2             ! number of sectors to load initially	
        !----------------------------------------------------------------
        ! boot SECTOR loaded at (0000:7C00). reload to segment 0x9800
        !----------------------------------------------------------------
start:
	mov  ax,#BOOTSEG          ! set ES to 0x9800
        mov  es,ax

! call BIOS to load BSECTORS of FD0 to (segment,offset)=(0x9800,0)
        xor  dx,dx                ! dh=head=0,    dl=drive=0
        xor  cx,cx                ! ch=cyl=0,     cl=sector=0
        incb cl                   ! sector=1 (BIOS counts sector from 1)
        xor  bx,bx                ! (ES,BX)= real address = (0x9800,0)
        movb ah,#2                ! ah=READ
	movb al,#BSECTORS         ! al=number of sectors to load 

        int  0x13                 ! call BIOS disk I/O function  

! far jump to (0x9800, next) to continue execution there
        jmpi next, BOOTSEG        ! CS=BOOTSEG, IP=next

next:
        mov  ax,cs                ! set up CPU segment registers 
        mov  ds,ax                ! we know ES,CS=0x9800. Let DS=CS  
        mov  ss,ax                ! SS = CS ===> all point at 0x9800
        mov  sp,#SSP              ! SP = SS + 32 KB

        call _main                ! call main() in C

        jmpi 0,OSSEG              ! jump to execute OS kernel at (OSSEG,0)
	
!======================== I/O functions =================================
! char getc(): return an input char
_getc:
        xorb ah,ah                ! clear ah
        int  0x16                 ! call BIOS to get a char in AX
        ret 

! void putc(char c) : print a char
_putc:           
        push bp
	mov  bp,sp
        movb al,4[bp]             ! aL = char
        movb ah,#14               ! aH = 14
        int  0x10                 ! call BIOS to display the char
        pop  bp
	ret
        
! int readfd(cyl, head, sector): load _NSEC sectors to (ES,0)
!             4     6     8    : parameter offsets in stack  
_readfd:                             
        push  bp
	mov   bp,sp            ! bp = stack frame pointer

        movb  dl, #0x00        ! drive=0 = FD0
        movb  dh, 6[bp]        ! head
        movb  cl, 8[bp]        ! sector
        incb  cl               ! inc sector by 1 to suit BIOS
        movb  ch, 4[bp]        ! cyl
        xor   bx, bx           ! BX=0
        movb  ah, #0x02        ! READ 
	movb  al, _NSEC        ! read _NSEC sectors to (ES,BX)
 
        int   0x13             ! call BIOS to read disk sectors
        jb   _error            ! error if CarryBit is set (read failed)
	pop  bp                
	ret

! setes(segment) : set  ES to segment        
_setes:  push  bp
	 mov   bp,sp
         mov   ax,4[bp]        
         mov   es,ax
	 pop   bp
	 ret

! inces() : increment ES by _NSEC sectors (in 16-byte clicks)
_inces:
        mov  bx,_NSEC          ! get _NSEC in BX
	shl  bx,#5             ! multiply by 0x20 = 32
	mov  ax,es             ! current ES
	add  ax,bx             ! add (_NSEC*0x20)
	mov  es,ax             ! update ES
        ret

!  error() & reboot function
_error:
        push #msg
        call _prints
        int  0x19              ! reboot
msg:    .asciz  "Error"
!------------------------- end of bs.s file ---------------------------------

Explanations of the Assembly Code.

The assembly code syntax is that of BCC's assembler, as86. When generating 
object code, BCC's C compiler prefixes every identifier with an underscore,e.g. 
main() becomes _main, getc() becomes _getc, etc. BCC's assembler uses the same
naming convention. An assembly code may import/export global symbols from/to C 
by .globl statements. Likewise, a C program may reference global symbols in 
assembly by decalring them as extern.
 
The assembly code begins with a standard header, which defines text, data and 
bss sections for the assembler. The actual code begins at the symbol start, 
which is the entry point of the booter program.  
 
During booting, BIOS loads sector 0 of the booter to the segment 0x07C0 and 
jumps to there to execute the booter. We assume that the booter may be larger 
than 512 bytes and that it must be relocated to a different memory area. Instead
of moving, the code calls BIOS INT 0x13 to load the first 2 sectors of the disk
to the segment 0x9800. The reason of loading 2 (or more) sectors will become 
clear shortly. After loading the booter again to the new segment, it does a far
jump, jmpi next, 0x9800, which sets the CPU's (CS,IP)=(0x9800, next), causing 
the booter to continue execution from the offset next in the segment 0x9800. 
The choice of 0x9800 is based on the principle that the booter should be 
relocated to a high memory area with enough space to run and leave as much space
as possible in the low memory area for the OS image. The segment 0x9800 is 32 KB
below the ROM area, which begins at the segment 0xA000. This gives the booter a
32 KB address space, which should be big enough even for a general booter.

When execution continues, both ES and CS already point at the segment 0x9800. 
The code proceeds to set DS and SS to 0x9800 also in order to comply with the 
one-segment memory model requirement. Then it sets the stack pointer to 32 KB 
above SS. The run-time memory image of the booter is shown in Figure 3-3.

               0x9800                0xA000
            -----------------------------------
               |code|data|bss| stack | /// ROM area
            -----------------------------------
               ^                     ^
               |<------ 32 KB ------>|             
          CS,DS,SS,ES                SP 

                Figure 3-3. Run-time image of booter

It is reported that some newer BIOS may use the memory area above 0x9A00 as the
Extended BIOS Data Area, which may conflict with the booter's memory set up
shown above. If so, we may set the booter's segment to a lower value, e.g. 
0x9400 or reduce its stack size. If the booter's initial loading area is
not needed by the incoming OS image, we may re-load the entire booter to the 
segment 0x07C0 and let it run in that segment. In that case, all we have to do
is to change BOOTSEG to 0x07C0. In the following, we shall assume that the 
booter runs in the segment 0x9800.

With a stack, the program can start to make calls. It calls main() in C, which 
implements the actual work of the booter. Upon return from main(), the CPU is
sent to execute the loaded MTX kernel at 0x1000.

The remaining assembly code contains I/O functions based on BIOS, where getc()
returns an input char from the keyboard and putc(c) displays a char to the 
screen. The functions readfd(), setes() and inces() are as follows. To load an
OS image, a booter must be able to load disk sectors. BIOS provides disk I/O 
functions via INT 0x13, which takes parameters in CPU registers:

         DH=head,     DL=drive
         CH=cyl,      CL=sector (count from 1)
         AH=2(READ),  AL=number of sectors to read (at most a track or cylinder)
         Memory address (segment,offset)=(ES,BX)
         return status : carry bit=0 means no error, 1 means error.

The function readfd(cyl,head,sector) calls BIOS int 0x13 to load NSEC sectors 
into memory, where NSEC is a global imported from C. The zero-counted disk 
parameters, (cyl,head,sector), are computed in C. Since BIOS counts sectors 
from 1, the sector values is incremented by 1 to suit BIOS. When loading disk 
sectors, BIOS uses (ES,BX) to determine the real memory address. Since BX=0, 
the loading address is (ES,0). Thus, ES must be set, by the setes(segment) 
function, to a desired loading segment prior to calling readfd(). The fucntion
code loads the parameters into CPU registers and issues INT 0x13. After loading
NSEC sectors, inces() may be used to increment ES by NSEC sectors (in 16-byte 
clicks) for loading the next NSEC sectors, etc. The error() function is used to
trap any errors during booting. It prints an error message, followed by reboot.
The use of NSEC as an global rather than a parameter serves two purposes. First,
it illustrates the cross reference of globals from either assembly or C code. 
Second, if a value does not change often, it makes little sense to pass it as a
function parameter because doing so would increase the code size. Since the 
booter size is limited to 512 bytes, saving even a few bytes could make a 
difference between sucess or failure.

Next we show the C code.

/************ bc.c file of MTX booter *******/
typedef unsigned char  u8;
typedef unsigned short u16;

#define TRK  18
#define CYL  36

u16 tsize, dsize, ksectors, i;
u16 NSEC = 1;

int prints(s) char *s;
{
  while (*s)
    putc(*s++);
}

int getsector(sector, buf) u16 sector; char *buf;
{
  readfd(sector/CYL,((sector)%CYL)/TRK,(((sector)%CYL)%TRK),buf);
}

main()
{
  prints("booting MTX\n\r");
  
  tsize = *(u16 *)(512+2);
  dsize = *(u16 *)(512+4);
  
  ksectors = ((tsize << 4) + dsize + 511)/512;

  setes(0x1000);
 
  for (i=1; i<=ksectors+1; i++){
      getSector(i);
      inces(); 
      putc('.');
  }
  prints("\n\rready to go?"); getc();
}

Explanations of The C Code

Disk sectors are numbered linearly as 0,1,2, .  However, BIOS INT13 only accepts
disk parameters in (cyl,head,sector) or CHS format. To load disk sectors, we 
must convert the starting sector number into CHS format.  Figure 3-4 shows the 
relationship between linear sector and CHS addressing of a floppy disk.

     |S0 ....  S17|S18 ....  S35|S36 ... S53|S54 ... S63|.........
     |   head=0   |    head=1   |   head=0  |  head = 1 |
     |<------- cyl = 0 -------->|<----- cyl = 1 ------->| etc.

        Figure 3-3 Linear Sector and CHS addressing

From the above diagram, it is easy to see that the conversion of a sector 
number, sec, into CHS can be done by using the Mailman's algorithm.

        cyl    =  sec / 36;
        head   = (sec % 36) / 18;
        sector = (sec % 36) % 18; 
 
Based on these, we implement a getSector() function in C, which calls readfd()
in assembly, for loading disk sectors. In the C code, the prints() function, 
which is based on putc() in assembly, is for printing message strings. As 
specified, on the boot disk the MTX kernel begins from sector 1, in which word 1
is the tsize of the MTX kernel (in 16-byte clicks) and word 2 is its dsize in 
bytes. Sectors 0 and 1 are already loaded at 0x9800 when the booter starts to
execute the C code. While in main(), the program's segment is CS=DS=SS=0x9800. 
Thus, words 1 and 2 of sector 1 are now at the (offset) addresses 512+2 and 512+
4, respectively. The C code extracts these numbers to compute the number of 
sectors of the MTX kernel to load. It then sets ES to the segment 0x1000 and 
loads the MTX sectors by a loop. The loading scheme resembles that of a 
"sliding window". Each iteration calls getSector(i) to load NSEC sectors 
begainning at sector i to the memory segment pointed by ES. After loading NSEC 
sectors to the current segment, ES is incremented by NSEC sectors for loading 
the next NSEC sectors, etc. Since NSEC=1, this amounts to loading the MTX kernel
by individual sectors. Faster loading schemes will be shown later.

Next, we consider booting "samll" Linux images. Bootable Limux images are
generated by the following steps. Under Linux,

      . cd to linux source code tree directory (/usr/src/linux)
      . create .config file, which guides make (make .config)
      . run make zImage (or make bzImage) to generate a bootable kernel image.

make zImage generates a "small" bootable Linux image, named zImage, in which 
the (compressed) kernel size is 512 KB or less. You can generate a zImage only 
with Linux kernel versions 2.4 or earlier. Even so, you may have to enable a 
minimal set of options in .config and compile most device drivers as modules. 
Otherwise, you can only use make bzImage to generate a "large" Linux kernel,
which requires a different loading scheme during booting. We shall discuss how 
to boot large Linux kernel later. Regardless of size, a bootable Linux image 
is composed of three contigious pieces.

          |BOOT| SETUP  | (compresse) linux kernel |
          ----------------------------------------- 
 sector : | 0  | 1 to n |  n+1 , (kernel size).....|
 
where BOOT is a booter for booting Linux from floppy disk and SETUP is for 
setting up the start up environment of the Linux kernel. The number of SETUP 
sectors, n, varies from 4 to 10. In addition to code, BOOT also contains the 
following boot parameters.

     offset              Contents
    --------   -----------------------------------
    byte 497    number of SETUP sectors
    byte 498    root dev flags: nonzero=READONLY
    word 500    linux kernel size in (16-byte) clicks
    word 504    old ramdisk information
    word 506    video mode
    word 508    root device=(major, minor)
   ------------------------------------------------

Most of the parameters can be altered by the utility program rdev. You may read
Linux's man page of rdev for more information. A complete list of Linux boot 
parameters is available in Linux's Documentation/x86/i386/boot.txt.

A Linux zImages is intended to be dumped to a FD as a boot disk. During booting,
BIOS load the boot sector, BOOT, into memory and exeuctes it. BOOT relocates 
itself to the segment 0x9000 and jumps to there to continues execution. Then, 
it loads SETUP to the segment 0x9020, i.e. immediately above BOOT, and the Linux
kernel to 0x1000. When loading completes, it jumps to 0x9020 to execute SETUP, 
which starts up the Linux kernel. Thus, the loading requirements of a Linux 
zImage are:
                BOOT+SETUP  : 0x9000      
                Linux Kernel: 0x1000
and the start up code, SETUP, is at 0x9020. Based on these, our Linux zImage
booter essentially duplicates exactly what linux's BOOT does.

To demonstrate the booter, we create a boot disk by dumping a Linux zImage to 
sector 1 by dd if=zImage2.4 of=/dev/fd0 bs=512 seek=1, where zImage2.4 is a 
Linux zImage based on kernel 2.4.31. Then use mk to install a Linux booter in 
sector 0. The resulting boot disk layout is

            |<------ Linux zImage ------->|
     | S0   | S1 S2 ...                   |   unused     |
     ----------------------------------------------------- 
     |booter|BOOT|SETUP|KERNEL......      |............. |

           Figure  . Linux Boot Disk layout

The MTX booter, which we developed earlier, can be adapted easily to booting 
Linux from such a boot disk. In the assembly code, we only need to change 
OSSEG to 0x9020 so that, upon return from main(), it will execute SETUP at 
0x9020. The C code is almost the same as that of the MTX booter. Therefore, 
we only show the modified main() function.

//  C code for Linux zImage booter
u8   setup;                         
u16  ksectors;                      

main()
{
  int i;
  prints("boot linux\n\r");

  setup = *(u8 *)(512+497);            // number of SETUP sectors   
  ksectors = *(u16 *)(512+500) >> 5;   // number of kernel sectors

  setes(0x9000);                       // load BOOT+SETUP to 0x9000

  for (i=1; i<=setup+ksectors+2; i++){ // S0+BOOT = 2 sectors before SETUP
    getsector(i);                      // load sector i
    i<=setup ? putc('*') : putc('.');  // for each sector loaded, show * or .
    inces();                           // inc ES by NSEC sector (clicks) 
    if (i==setup+1)                    // set ES=0x1000 for loading kernel
       setes(0x1000);
  }
  prints("\n\rready to go?"); getc();
}

In the zImage2.4 image, the root device (a word at offset 508) is (2,0), i.e. 
the first FD drive. When Linux boots up, it will try to mount (2,0) as the root
device. Since the boot disk is not a valid file system, the mount will fail and
the Linux kernel will display a message
          Kernel panic: VFS: Unable to mount root fs 02:00 
and stop. To make the Linux kernel runable, you may use rdev to change the root
device parameter to a device containing a valid Linux file system. For example,
assume that you have a Linux system installed in partition 2 of a hard disk. By
setting the root device in a zImage to 0x0302, Linux will boot up and run 
successfully. The drawback of using rdev is that it changes the boot parameters
in an image file. It would be better to have an "intelligent" booter, which
modifies loaded boot parameters in memory. This technique will be shown later.
Another way to make the Linux kernel runable is to include a ramdisk image in 
the tail part of the boot disk and configure the Linux kernel to load the 
ramdisk image as root file system. Details of how to boot Linux with ramdisk 
image are left as an exercise (Problem # ).    

Fast Loading Schemes.

The above booters load an OS image by loading disk sectors one at a time. For 
small OS images, e.g. MTX kernel, this works fine. For large OS images like 
Linux, it would be too slow to be acceptable. A faster loading scheme is more 
desirable. How fast can we load? When booting a Linux zImage, logically and 
ideally only two loading operations are needed, as in

     setes(0x9000);  NSEC = setup+1;   getsector(1);
     setes(0x1000);  NSEC = ksectors;  getsector(setup+2);

Unfortunately, things are not so simple due to limitations in hardware. Most 
floppy drives support reading a full track at a time. Some drives may support 
reading a complete cylinder. The discussion here assumes 1.44 MB drives that 
support reading cylinders. The first problem is that when loading FD sectors 
the sectors cannot cross any cylinder boundary. For example, from the sector 
number 34 (count from 0), loading 1 or 2 sectors is OK but attempting to load 
more than 2 sectors would result in an error. This is because sectors 34-35 are
in cylinder 0 but sector 36 is in clinder 1; going from sector 35 to 36 crosses
a cylinder boundary, which is not allowed by the drive hardware. This means that
each time we can load at most a full cylinder of 36 sectors. The second problem
is the famous, or infamous, depending on your view, cross 64KB boundary problem
[ ], which states that while loading FD sectors, the real memory address cannot
cross any 64KB boundary. For example, from the real address 0x0FE00, if we try 
to load 2 sectors, the second sector would be loaded to the addresses between 
0x0FE00+0x200 = 0x10000 and 0x0FE00+0x400 = 0x10200, which crosses the 64KB 
boundary at 0x10000. The problem is because of the DMA controller used by FD 
drives. When loading the first sector, the loading address is 0x0FE00. The DMA 
controller expands the address to 24 bits, with the high byte=0x0000 as the 
segment and the low 16 bits=0xFE00 as the offset. When loading the second sector
for some strange reason, the DMA controller does not increment the high byte, 
which remains 0x0000, even though the intended address has incremented to 
0x0001:0000. In this case, loading may still occur but only to the wrong place.
Instead of the intended address, the second sector would be loaded to 0x0000:
0000, over-writing whatever (interrupt vectors) is there, which effectively 
kills the BIOS. Had IBM designed the DMA controller correctly, there won't be 
any cross 64KB boundary problem, but that's another story. In short, when 
loading OS images a booter must avoid both problems.
 
A tirvial way to avoid both problems is to load disk sectors one by one as we 
have done so far. Obviously, loading one sector at a time will never cross any 
cylinder boundary. If the loading segment starts at a sector boundary (divisible
by 0x20) it will also never cross any 64KB boundary. Similarly, if the OS image
starts from a (1KB) block boundary on disk and the loading segment also begins 
from a block boundary, then loading 2-sector blocks would also work. Following
this line of reasoning, it is easy to see that the maximal number of sectors we
can load each time is only 4. The reader is encouraged to prove this. Can we do
still better? The answer is yes. Many existing boot-loaders try to load by 
tracks. Here we present a fast loading scheme, called the "cross-country" 
algorithm, which loads by cylinders. The algorithm resembles the behavior of a 
cross country runner negotiating an obstacle course. When there is open space, 
the runner takes full strides (load cylinders) to run fast. When there is an 
obstacle ahead, the runner slows down by taking half stride (load partial 
cylinder) until the obstacle is reached. After clearing the obstacle and 
regaining his/her balance, the runner resumes fast running again, etc. 

The following C code shows a Linux zImage booter that implements the cross 
country algorithm, where getes() is an assembly function which returns the
current ES segment value. In order to keep the booter to within 512 bytes,
prints() has been down graded to print only one char. The booter print a 'C' 
for each cylinder loaded and a | whenever a 64KB boundary is crossed. The 
booter size is 508 bytes or 484 bytes with prints() as an empty function.

/******************** Cross Country Algorithm ***************************:
 Load kernel zImage by cylinders. A bootable FD contains 

   S0 S1 ...      ..|Ssetup+2 ...............               |
   |<- setup + 2  ->|<----  Linux kernel image (<512KB)---->|

  Goal : load linux kernel image to 0x1000 without crossing either cylinder 
         or 64KB boundary.

                            Algorithm:
  Load by cylinder will never cross any cylinder boundary. If a cylinder is 
  about to cross 64KB segment, compute NSEC = max. number of sectors that can 
  be loaded without crossing 64KB. Load NSEC sectors. Then load the remaining 
  CYL-NSEC sectors. Then load by cylinder again, etc
 ***************************************************************************/

typedef unsigned char  u8;
typedef unsigned short u16;

#define TRK  18
#define CYL  36

u16 setup, ksectors,ES;
u16 csector = 1;     // current loading sector
u16 NSEC = 35;       // initial number of sectors to load >= BOOT+SETUP

int prints(s) char *s;
{
  putc(*s);          // degrade prints() to keep booter size to 512 bytes.
}

int getsector(sector) u16 sector;
{
   readfd( sector/CYL,((sector)%CYL)/TRK,(((sector)%CYL)%TRK));
   csector += NSEC;
   inces();
}
						\
main()
{
  putc('S');          // Show Start

  setes(0x9000);
  getsector(1);       // load Linux's [boot+SETUP] to 0x9000

  // current sector = SETUP's sector count (at offset 512+497) + 2 
  setup   = *(u8 *)(512+497) + 2;
  ksectors = (*(u16 *)(512+500)) >> 5;

  NSEC = CYL - setup; // sectors remain in cylinder 0
  
  setes(0x1000);      // Linux kernel is loaded at segemnt 0x1000
  getsector(setup);   // load the remaininig sectors of cylinder 0

  // we are now at begining of cyl#1
  csector = CYL;

  while (csector < ksectors+setup){
                      // try to load by cylinders of 36 sectors each
    ES = getes();     // current ES value

    if ( ((ES + CYL*0x20) & 0xF000) == (ES & 0xF000)){ //still same segment
        NSEC = CYL;         // load a full cylinder 
        getsector(csector);
	putc('C');          // show loaded a cylinder
    }
    else{                   // this cyl will cross 64K
        NSEC = 1;
        while( ((ES + NSEC*0x20) & 0xF000) == (ES & 0xF000) )
	     NSEC++;        // number of sectors can still load
        getsector(csector);

	NSEC = CYL - NSEC;  // load remaining sectors of this cyl 
        if (NSEC){          // only if this cyl has sectors remaining
           getsector(csector);
        }
        putc('|'); 
    }
  }
  putc('E');                // Show End of loading
  getc();
}


1b. Booting Image files in a file system.

Our second booter is to boot MTX from a MTX system (floppy) disk. A MTX system 
disk is an EXT2 file system with 1 KB block size. Block 0, which is not used by
the file system, contains the MTX booter. Bootable MTX kernel images are files 
in the /boot directoy. Upon booting up, the MTX kernel mounts the the same boot
disk as the root file system.

When booting an OS from a file system, the problem is essentially how to find 
the file's inode. The reader may refer to Chapter 1 for how to traverse an EXT2
file system. Here we only give a breif review of the steps. Assume that the file
name is /boot/mtx. First, read in the 0th group descriptor to find the start 
block of the inodes table. Then read in the root inode, which the number 2 inode
in the inode table. From the root inode's data blocks, search for the first 
component name, boot. Once the entry boot is found, we know its inode number. 
Use Mailman's algorithm to convert the inode number to the disk block containing
the inode and its offset in that block. Read in the inode of /boot and repeat 
the search for the next component mtx. If the search steps succeed, we should 
have the image file's inode in memory. The image's size and disk blocks can all
be determined from the inode. Then we can load the image by loading its disk 
blocks.

In order for a booter to access disk blocks, we modify the assembly function, 
readfd(), with an additional parameter, buf, which is passed to BIOS in BX as 
the offset address from the loading segment in ES.

       .globl _NSEC            ! NSEC = 2
       !---------------------------------------
       ! readfd(cyl, head, sector, buf)
       !        4     6     8      10
       !---------------------------------------
_readfd:                             
        push  bp
	mov   bp,sp            ! bp = stack frame pointer

        movb  dl, #0x00        ! drive 0=FD0
        movb  dh, 6[bp]        ! head
        movb  cl, 8[bp]        ! sector
        incb  cl               ! inc sector by 1 to suit BIOS
        movb  ch, 4[bp]        ! cyl
        mov   bx, 10[bp]       ! BX=buf ==> memory addr=(ES,BX)
        movb  ax, #0x02
        movb  al, _NSEC        ! READ NSEC sectors to (EX, BX)

        int  0x13              ! call BIOS to read the block 
        jb   _error            ! to error if CarryBit is on [read failed]

        pop  bp                
	ret

Coressponding to this, we modify getSector() in C, which take a block number 
and buf as parameters. When the booter starts, ES points at the same segment of
the booter. Within the booter's C code, if buf is global, it is relative to DS.
If it is local, it is relative to SS. Thus, no matter how we define buf, the 
loading address is always in the booter's segment. When loading the blocks of 
an OS image, we set ES to seccessive loading segments and keep buf=0. 

Since the assembly code is essentially the same as before, we only show the 
booter's C code.   

/*******************************************************
*            MTX booter : bc.c file                    *
*******************************************************/
#define TRK 18
#define CYL 36
#define BLK 1024

typedef unsigned char  u8;
typedef unsigned short u16;
typedef unsigned long  u32;

#include "ext2.h"
typedef struct ext2_group_desc  GD;
typedef struct ext2_inode       INODE;
typedef struct ext2_dir_entry_2 DIR;

#define INODES_PER_BLK BLK/sizeof(INODE)

int prints(s) char *s;
{
   while(*s)
     putc(*s++);
}

int gets(s) char *s;
{ 
    while ( (*s=getc()) != '\r')
      putc(*s++);
    *s = 0;
}

u16 NSEC = 2;
char buf1[BLK], buf2[BLK];

u16 getblk(blk, buf) u16 blk; char *buf;
{
    readfd(blk/TRK, ((blk)%TRK)/9, (((blk)%TRK)%9)<<1, buf);
}

u16 search(ip, name) INODE *ip; char *name;
{
   int i; char c;
   DIR  *dp; 
   
   for (i=0; i<12; i++){   // assume DIR only has direct blocks
       if ( (u16)ip->i_block[i] ){
          getblk((u16)ip->i_block[i], buf2);
          dp = (DIR *)buf2;

          while ((char *)dp < &buf2[1024]){
              c = dp->name[dp->name_len];  // save last byte of name[ ]

              dp->name[dp->name_len] = 0;   
	      prints(dp->name); putc(' ');

              if ( strcmp(dp->name, name) == 0 ){
                 prints("\n\r"); 
                 return((u16)dp->inode);
              }
              dp->name[dp->name_len] = c; // restore that last byte
              dp = (char *)dp + dp->rec_len;
	}
     }
   }
   error();
}

main()
{ 
  char   *cp, *name[2], filename[64];
  u16    i, ino, blk, iblk;
  u32    *up;

  GD    *gp;
  INODE *ip;
  DIR   *dp;

  name[0] = "boot";
  name[1] = filename;

  prints("bootname: ");  
  gets(filename);
  if (filename[0]==0)
      name[1]="mtx";

  /* read blk#2 to get group descriptor 0 */
  getblk(2, buf1);
  gp = (GD *)buf1;
  iblk = (u16)gp->bg_inode_table;      // inode table begin block#
  getblk(iblk, buf1);                  // read first inode block 

  ip = (INODE *)buf1 + 1;              // ip->root inode #2

  /* serach for system name */
  for (i=0; i<2; i++){
      ino = search(ip, name[i]) - 1;
      getblk(iblk+(ino/INODES_PER_BLK), buf1);  // read block containing inode
      ip = (INODE *)buf1 + (ino % INODES_PER_BLK);  // ip-> new inode
  }
  /* read indirect block into buf2 */
  getblk((u16)ip->i_block[12], buf2);  // assume : MTX has indirect blocks

  setes(0x1000);
  /* load direct blocks */
  for (i=0; i<12; i++){
      getblk((u16)ip->i_block[i], 0);
      putc('*');
      inces();
  }
  /* load indirect blocks */
  up = (u32 *)buf2;      
  while(*up){
     getblk((u16)*up, 0); 
     putc('.');
     inces();
     up++;
  }
}  
------------------------------------------------------------------------------

As before, the above MTX booter can be adapted to booting Linux zImage from an
EXT2 file system. This is left as an exercise.