560 Notes #1 Booting OS


1. Booting in General:
   The process of starting up an operating system from a disk is called 
   booting or bootstrap. Different machines may have different sequence 
   of actions during booting. To be more specific, we shall consider the 
   booting sequence of Intel 80x86 based PCs.

   Every PC has a ROM, which contains a set of programs called the BIOS.
   When power is turned on or following a reset, the CPU starts executing
   BIOS.  BIOS checks the system hardware, initializes itself, including 
   setting up the interrupt vectors at low memory area to point to its
   service routines.  Then, it starts to look for a device to boot. 
   Bootable devices are maintained in a programmable CMOS memory, which include
   floppy disk, hard disk, CDROM and USB drive. The booting order is usually
   A:(floppy disk), then C: (first hard disk). If there is a diskette in A:, 
   BIOS will try to boot form it. Otherwise, it will try to boot from C: etc.

   A booter is a program that boots up itself first. Then it loads another 
   program, such as an operating system, into memory for execution. A booter 
   usually occupies the first sector (or block) of a bootable device. 

1.1. Boot from floppy disk.   
   When booting from a FD, BIOS loads the very first sector (512 bytes) of 
   the disk into (segment, offset)=(0x0000, 0x7C00), and jump to there to 
   executes the booter.  After this, it is entirely up to the booter code 
   to do the rest.

   In order to make room for the OS to be loaded, the booter usually relocates
   itself to a high memory area, from where it continues to load the OS image 
   into memory. When loading completes, the booter simply transfers control to 
   the OS, causing it to start up.

1.2  Boot from hard disk
   Booting from a hard disk is only slightly more complex. A hard disk is 
   usually divided into several logically independent units, called 
   partitions.  The start cylinder, end cylinder and size of the partitions 
   are recorded in a Partition Table. The very first sector of a hard disk 
   is called the Master Boot Record (MBR). It contains a boot program, the 
   Partition Table, and the boot signature at the end. Each partition may 
   contain a bootable system. If so, each partition may have its own booter 
   in the first sector (or block) of that partition.

   During booting, BIOS loads the MBR to (0x0000, 0x7C00) as usual, and turns 
   control over to it. Once execution starts, the MBR boot program has two
   options: 
   (1). it may boot an OS directly, as the Linux booters LILO (LInux LOader)
        and GRUB (GRneral Universal Booter) do, OR
   (2). it may act as a CHAIN booter, in which case it searches for an active 
        partition to boot, and then loads the partition's boot sector to 
        (0x0000,0x7c00) and truns control over to it. It is now up to the 
        partition's booter to load and start the OS from that partition. 

1.3. Boot from CD(DVD)ROM:
    A bootable CD(DVD)ROM is created by first creating an ISO9600 file system
    and then writing it to a CD or DVD disc. A bootable CD-DVD has three 
    options:
    (1). Emulating a floppy disk: the boot image is a bootable FD image. Upon
         booting up, the FD image is accessed as drive A: while the physical A: 
         drive is demoted to B: drive.
    (2). Emulating a hard disk: the boot image is a bootable HD image with a 
         single partition. Upon booting up, the HD image becomes C: while the
         phisical C: drive is demoted to D:, etc.
    (3). NON-emulated booting: the boot image is booted as is. Upon booting up,
         the CD (DVD) is accessed as a device determined by BIOS, i.e. device
         number = a byte value randomly assigned by BIOS.
     
    It is noted that in all the above cases, although the boot device is known,
    the contents of the CD(DVD)ROM are NOT accessible unless the booted up
    environment includes drivers to access the file system on the CD(DVD).
    
1.4. Boot from USB drive: This is almost identical to booting from HD. During
     booting, BIOS emulates a USB drive as C: drive.

2. Bootable MTX Image

MTX is a samll operating system developed by KCW. It is designed to run on
Intel-based PCs in the real (unprotected) mode or any PC emulators, such as 
DOSEMU, QEMU, BOCHS, VMware, etc. MTX can run from either a floppy disk or a 
hard disk partition. For simplicity (and safety), we shall begin with MTX on 
floppy disks.

From the samples/A1/ directory, you can download the file mtximage.gz.
Uncompress and dump it to a floppy disk by
           gunzip mtximage.gz
           dd if=mtximage of=/dev/fd0
The resulting floppy disk is a bootable MTX system composed of the following

    block0 | EXT2 file system (1 KB blocks)
    BOOTER |             /
                         |
               ------------------------
               bin  boot dev  etc  user
                     | 
                    mtx

where block0 contains a MTX booter and /boot/mtx is a bootable MTX image file.
During booting, the MTX booter is loaded into memory and runs first. It prompts
for a MTX image (file) name in the /boot directory to boot. The default image 
name is mtx. It loads a mtx image to the segment 0x1000, and then jumps to
(segment, offset)=(0x1000, 0) to start up MTX.
  

3. Bootbale Linux Image (and also MTX in protected 32-bit mode)

3-1. Bootable Linux Image: 

   Bootable Linux images are originally for booting from floppy disks. Since 
Kernel 2.6, booting Linux from floppy disks is no longer supported. A bootable
(Big) Linux image (bzImage) is composed of the following components:

     Sector#0        : BOOT, a booter program in 16-bit machine code for booting
                       Linux from floppy disk. For non-floppy disk booting, this
                       sector contains some boot parameters in bytes 497-509.
     Sector#1 to N-1 : SETUP, also in 16-bit machine code;
     Sector#N onward: Linux Kernel; typically 2MB but upto 5MB in size

    Depending on the size of SETUP, N varies from 10 to 24. To simplify
    booting, we shall assume N=24 (for reasons shown later).

2. Linux Booting Sequence: 
     
2-1. BIOS loads and executes a Linux booter. The Linux booter FINDs the
     Linux bootable image, a bzImage file of the above layout.

2-2. Linux booter loads Sector0 + SETUP to segment 0x9000.
     Then it loads the Linux kernel to segment 0x10000, i.e. 1MB in high memory.
     When loading completes, the Linux booter jumps to (0x9020, 0) to execute 
     SETUP.

2-3. SETUP:
     SETUP sets up the starting environment for the Linux Kernel, such as
     the root device, video display mode, changing CPU from 16-bit mode 
     to 32-bit mode, etc. It then jumps to (0x10000, 0) to start up Linux.

     If the Linux image is STORED AS A FILE, such as /boot/bzImage, on a hard
     disk, the block size is usually 4KB (8 sectors). If N is a multiple of 8,
     e.g. 24, Linux kernel would begin at a block boundary, which makes loading
     the Linux kerenl easier.
      
4. Working Environment:

   When the PC starts up, it is in the so called 16-bit or unprotected 
   mode. While in this mode, it can only execute 16-bit code (and access
   1M bytes of memory).

   During booting, we must use BIOS to do I/O because there is NO operating
   system yet. BIOS functions are called by the  
               INT  #n   
   instruction, where the number n indicates which BIOS function we are 
   calling. Parameters to BIOS functions are passed in CPU registers.
   Return value is in the AX register.

   Boot programs are usually written entirely in (16-bit) assembly code
   becaue their logic is quite simple, namely, to load the disk sectors
   of an OS into memory. They do not have to know how to FIND the image 
   of an OS. 

   Based on the above discussions, a quick summary is in order:

                    During booting:

   (1). We must call BIOS functions to do I/O, so assembly code is
        un-avoidable !!!!  But this should be kept to a minimum.  

   (2). Our boot program must FIND a Linux bootable image file to load.
        Although it is possible to write such a program in assembly, it would 
        be rather silly to do so. The major part of it should be written in C.

   (3). Linux's gcc compiler generates 32-bit code, whcih is unusable during 
        booting.  We must use a C compiler and a linker that generate 16-bit 
        machine code.

   (4). To meet these requriements, we will use bcc, as86 and ld86 package,
        which runs under Linux but generates 16-bit code for Intel processors.

5. Boot Generic Linux Kernel with initrd (Initial Ramdisk) support:
   The configurations of Linux systems vary greatly. It is impractical to
   generate a Linux kernel with all possible device drivers built-in. A common
   way of dealing with different Linux configurations is to compile the device
   drivers as modules, and generate a generic bootable kernel that can boot
   up and run off a RAMdisk (initrd) first. The initrd provides a minimum root 
   file system for the Linux kernel, including a sh interpreter. While in this
   simple environment, Linux can execute an init sh script to install the 
   needed device drivers for the REAL root file system. After that, the kernel
   can umount the initrd and mount the REAL root file system.

   As an example, I have installed Linux on a USB drive. During booting, the 
   Linux (2.6.27.7 SMP kernel) is booted up first with an initrd image as root. 
   Then it executes an init file to install the USB driver modules. Then it 
   switches the root device from initrd to the USB drive and run on the USB 
   drive.
   Another common example is to install Linux from a bootable CD (DVD). During
   booting, a generic Linux kernel is booted up from the CD with an initrd
   image as root. It then install the CD (DVD) driver modules, allowing the
   Linux kernel to mount and read the CD(DVD) contents, which are Linux
   installation files in either tgz or RPM formats.