Hard Disk Booter
A HD booter is for booting large Linux kernels (bzImage) from hard disk
partitions. The HD booter consists of 4 files; a bs.s in assembly, a bc.c in C,
which includes io.c and bootLinux.c. Druing booting, it displays the hard disk
partitions and prompts for a partition number to boot. If the partition type is
Linux, it allows the user to enter a filename, or choose the default /vmlinuz,
to boot. In addition, it also supports loading of initial ramdisk, initrd,
images. For non-Linux partitions, it acts as a chain-booter to boot other
operating systems, e.g. Windows.
The HD booter consists of 5 logical parts. Each part is essentially an
independent programming task, which can be developed and tested separately
before being adapted to 16-bit environment for booting. The following is a
brief description of the booter's logical components.
1. Terminal I/O and memory access functions:
A hard disk booter is no longer limited to 512 or 1024 bytes. With a larger
code size, it should provide better user interface during booting. Accordingly,
we implement a set of I/O fucntions, which are based on getc()/putc() of BIOS.
Among these, we only outline the implementation of a rudimentary printf()
fucntion for formatted printing. First, we implement a printu() for printing
unsigned short integers.
char *ctable = "0123456789ABCDEF";
u16 BASE = 10;
int rpu(x) u16 x;
{
char c;
if (x){
c = ctable[x % BASE];
rpu(x / BASE);
putc(c);
}
}
int printu(x) u16 x;
{
(x) ? rpu(x) : putc('0');
putc(' ');
}
where rpu(x) recursively generates the digits of x % 10 and prints them on
return. For example, if x=123, the digits are generated in the order of '3', 2',
'1' but are printed as '1' '2' '3' as they should. With printu(), implementing
printd(), which prints signed integers, becomes trivial. By setting BASE = 16,
we can print in hex. By changing the parameter type to u32, we can print long
values, e.g. LBA sector and inode numbers. Assume that we have prints(),
printd(), printu(), printx(), printl() and printX(), where printl() and printX()
print long values in decimal and hex, respectively. It is easy to implement a
simple printf(char *fmt, ...) function for formatted printing, where fmt is a
format string containing conversion symbols %c, %s, %u, %d, %x, %l, %X.
int printf(fmt) char *fmt;
{
// print items passed in as
// char, string, unsiged short, int, hex, long, lonng-hex
// %c %s %u %d %x %D %X
}
This printf() function does not support field width or precision but it is
adequate for the simple printing task during booting. It improves the
readability of the booter code.
When booting a Linux bzImage, the booter must get the number of SETUP sectors
to determine how to load the various pieces of the image. After loading, it must
set some of the boot parameters in the loaded BOOT and SETUP sectors for the
linux kernel. (Boot parameters and their contents are listed in the booter's C
code). For this purpose, we implement the functions
#define BOOTSEG 0x9800
int get_byte(segment, offset) u16 segment,offset;
{
u8 byte;
setds(segment); // set DS to segment
byte = *(u8 *)offset; // access values in SS
setds(BOOTSEG); // restore DS
return byte
}
int put_byte(byte, segment, offset) u8 byte; u16 word,segment,offset;
{
setds(DS);
*(u8 *)offset = byte;
setds(BOOTSEG);
}
Then, you can implement get_word()/put_word() fucntions, which allow the booter
to access memory outside its own segment. These I/O and memory access functions
are in a io.c file.
Part 2: Read Hard Disk and Access High Memory
Unlike floppy disks, which use CHS addressing, newer hard disks use Linear
Block Addressing (LBA) in which disk sector numbers are 32 bits. In order to
read hard disk sectors, we have to use the extended BIOS INT13-42 (INT 0x13
AH=0x42) fucntion, whose parameters are specified in a Disk Address Packet (DAP)
structure.
// DAP struct for INT13-42
struct dap{
u8 len; // dap length=0x10 (16 bytes)
u8 zero; // must be 0
u16 nsector; // number of sectors to read; 1 to 127
u16 addr; // memory address = (segment,addr)
u16 segment; // segment value
u32 sectorLo; // low 4 bytes of LBA sector#
u32 sectorHi; // high 4 bytes of LBA sector#
};
To call INT13-42, we define a global dap struct and initialize it once, as in
struct dap dap, *dp=&dap; // dap and dp are globals in C
dp->len = 0x10; // dap length = 0x10
dp->zero = 0; // this field must be 0
dp->sectorHi = 0; // assume 32-bit LBA, high 4 bytes always 0
// other fields will be set when the dap is used in calls
Within the C code, you may set dap's segment, and then call getSector() to load
a disk sector into the memory location (segment, offset), as in
// Assume: dap.segment has been set by dp->segment = segment;
int getSector(sector, offset) u32 sector; u16 offset;
{
dp->nsector = 1;
dp->addr = offset;
dp->sectorLo= sector;
diskr();
}
where diskr() is in assembly, which uses the global dap to call BIOS INT13-42.
!-------------------- assembly code ------------------------------
.globl _diskr,_dap ! _dap is a global dap struct in C
_diskr:
mov dx, #0x0080 ! device=0x80=first hard drive
mov ax, #0x4200 ! aH=0x42
mov si, #_dap ! (DS,SI) point at _dap in booter's DS
int 0x13 ! call BIOS INT13-42 to read sectors
jb _error ! to error() if CarryBit is set (read failed)
ret
!--------------------------------------------------------------------
Similarly, the function
int getblk(blk, offset, nblk) u32 blk; u16 offset, nblk;
{
dp->nsectors = nblk*SECTORS_PER_BLOCK; // max. value = 127
dp->addr = offset;
dp->sectorLo = blk*SECTORS_PER_BLOCK;
diskr();
}
loads nblk contigious disk blocks into memory, beginning from (segment,offset),
where nblk <= 15 because dp->nsectors is really a u8, hence <= 127.
2-2 Load kernel and initrd images to high memory:
During booting, bzImage's BOOT+SETUP are loaded to 0x9000 as before but the
kernel is loaded at the real address 0x100000 (1MB) in high memory. If a ramdisk
image is specified, it is also loaded to high memory. Since the PC is in real-
mode during booting, it cannot access high memory directly. A booter may access
high memory by switching the PC to protected mode and then switch back to real-
mode afterwards, which require a lot of work. A better way is to use the
extended BIOS INT15-87 [ ] function, which is designed to copy memory between
real and protected modes. Parameters to INT15-87 are specified in a Global
Descriptor Table (GDT), which is
struct GDT
{
u32 zeros[4]; // 16 bytes 0's for BIOS to use
// src address
u16 src_seg_limit; // 0xFFFF = 64KB
u32 src_addr; // low 3 bytes of src address, high-byte=0x93
u16 src_hiword; // 0x93, high-byte=4th byte of src address
// dest address
u16 dest_seg_limit; // 0xFFFF = 64KB
u32 dest_addr; // low 3 bytes of dest address, high-byte=0x93
u16 dest_hiword; // 0x93, high-byte=4th byte of dest address
// BIOS CS DS
u32 bzeros[4];
};
The GDT specifies a src address and a dest address; both are 32-bit real
addresses. However, the bytes that form these addresses are not adjacent, which
makes them hard to access. For convenience, both src_addr and dest_addr are
defined as u32 but only the low 3 bytes are part of the address; the high byte
is the access rights 0x93. Similarly, both src_hiword and dest_hiword are
defined as u16 but only the high byte is the 4th address byte; the low byte
is again the access rights 0x93. For example, if we want to copy from the real
address 0x00010000 to 0x01000000 (16MB), a GDT can be initialized as follows.
init_gdt(p) struct GDT *p;
{
int i;
for (i=0; i<4; i++){
p->zeros[i] = p->bzeros[i] = 0;
}
p->src_seg_limit = p->dest_seg_limit = 0xFFFF; // 64KB segments
p->src_addr = 0x93010000; // bytes 0x00 00 01 93
p->dest_addr = 0x93000000; // bytes 0x00 00 00 93
p->src_hiword = 0x0093; // bytes 0x93 00
p->dest_hiword= 0x0193; // bytes 0x93 01
}
The following code segment shows how to copy 4096 bytes from 0x00010000 to
0x01000000.
C code:
struct GDT gdt; // define a gdt struct
init_gdt(&dgt); // initialize gdt as shown above
cp2himem(); // assembly code that does the copying
Assembly code:
.globl _cp2himem,_gdt ! _gdt is a global GDT from C
_cp2himem:
mov cx,#2048 ! CX=number of 2-byte words to copy
mov si,#_gdt ! (DS,SI) point to GDT struct
mov ax,#0x8700 ! aH=0x87
int 0x15 ! call BIOS INT15-87
jc _error
ret
Based on these, we can load the blocks of an image file to high memory as
follows.
1. load a disk block (4KB or 8 sectors) to segment 0x1000;
2. cp2himem();
3. gdt.vm_addr += 4096;
4. repeat 1-3 for next block, etc.
This is the basic loading scheme of the booter. For fast loading, the booter's
load() function actually tries to load up to 15 contigious blocks at a time,
which is limited by 127 sectors per disk read. It is also observed that many
newer BIOS allow 128 sectors in INT13-42 calls. On such machines, we can load
a maximum of 16 consecutive blocks or 64 KB at a time. The 64 KB limit is due
to the segment limit in the GDT.
3. Display hard disk partitions
On a hard disk, the partition table is in the MBR sector at the byte offset
446 (0x1BE). The table has 4 entries, each 16 bytes long, as defined by
struct partition {
u8 drive; /* 0x80 - active */
u8 head; /* starting head */
u8 char sector; /* starting sector */
u8 cylinder; /* starting cylinder */
u8 sys_type; /* partition type */
u8 end_head; /* end head */
u8 end_sector; /* end sector */
u8 end_cylinder; /* end cylinder */
u32 start_sector; /* starting sector counting from 0 */
u32 nr_sectors; /* number of sectors in partition */
};
where sys_type indicates the partition's file system type. You may consult
Linux's fdisk print out for a list of standard partition types.
If a partition is EXTEND type (type=5), it can be further divided into more
partitions. The extended partitions forms a link list in the EXTEND partition
area. As an example, assume that partition P4 is EXTEND type. Then
P4's startSector = MBR
P5's startSector
P6's MBR sector# = MBR
(r.e. to P4) P6's startSector
P7's MBR r.e. to P4 --> etc.
where all the (local) MBR's sector numbers are relative to P4's startSector.
As usual, the link list ends with a 0 in the last local MBR.
In a partition table, the CHS values are valid only for disks smaller than 8GB.
For disks larger than 8GB (but fewer than 4G sectors), only the last 2 entries,
start_sector and nr_sectors, are meaningful. Therefore, the booter should only
display the type, start sector and size of the paritions.
4. Find Linux and initrd image files
The steps used to find a Linux bzImage or rmadisk image file are essentially
the same as those shown before. The main differences stem from the need to
traverse large EXT2/EXT3 file systems on hard disk, which are noted below.
1. In a hard disk partition, the super block of an EXT2/EXT3 file system is
always at the byte offset 1024. A booter must read the super block to get the
values of s_first_data_block, s_log_block_size, s_inodes_per_group and
s_inode_size, where s_log_block_size determine the block size, which in turn
determines group_desc_per_block, inodes_per_block, etc. These numbers are needed
when traversing the file system.
2. A large EXT2/EXT3 file system may have many groups. Group descriptors begin
at the block (1+s_first_data_block), which is usually 1. Given a group number,
we must find its group descriptor and then use the group descriptor to find the
group's inodes start block.
3. The central problem is how to to convert an inode number to its inode. This
can be done by the Mailman's algorithm:
Algorithm: Convert an inode number, ino, to the disk block# containing
the inode and the inode's offset in that block:
(1). Compute group# and offset# in that group
group = (ino-1) / inodes_per_group;
inumber = (ino-1) % inodes_per_group;
(2). Find the group's group descriptor
gdblk = group / desc_per_block; // which block this GD is in
gdisp = group % desc_per_block; // which GD in that block
(3). Compute inode's block# and offset in that group
blk = inumber / inodes_per_block; // blk# r.e.to group inode_table
disp = inumber % inodes_per_block; // inode offset in that block
(4). Read group descriptor to get group's inode table start block#
getBlk(1+first_data_block+gdblk, buf); // GD begins at 1+frist_data_blk
gp = (GD *)buf + gdisp; // it's this group desc.
blk += gp->bg_inode_table; // blk is relative to group's inode_table
getBlk(blk, buf, 1); // read the disk block containing inode
INODE *ip = (INODE *)buf + (disp*inode_ratio);
// inode_ratio=2 if inode_size=256 bytes
When the algorithm ends, INODE *ip should point at the file's inode in memory.
5. Load kernel and ramdisk images to high memory
With getblk() and cp2himem(), loading kernel image to 0x00100000 (1MB and above
in high memory) is striaghtforward. The only complication is when the kernel
image does not begin at a block boundary. For example, if the number of SETUP
sectors is 12, then 5 sectors of the kernel are in block1, which must be loaded
to 0x100000 first before we can load the remaining kernel by blocks. In
contrast, if the number of SETUP sectors is 23, then BOOT and SETUP are in the
first 3 blocks and kernel begins at block #3. In this case, we can load the
entire kernel by blocks without having to deal with fractions of a block at the
beginning.
Next, we consider loading ramdisk images. An excellent overview on Linux
initial ramdisk (initrd) by M.T.Jones is at
http://www.ibm.com/developerworks/linux/library/l-initrd.html
An initial ramdisk (initrd) is a small file system that is mounted by the Linux
kernel as a temporary root file system when the kernel starts. The initrd
contains a minimal set of directories and executables, such as a sh, the insmod
tool and the needed driver modules. While running on initrd, the kernel
typically executes a sh script, init, to install the needed driver modules and
activate the real root device. When the real root device becomes ready, the
kernel abandons the initrd and mounts the real root file system to complete a
2-stage boot up process.
Creating an initrd image used to be a tedious process until someone wrote a
sh script, mkinitrd, which is now avaiable in almost all Linux distributions.
By default, mkinitrd creates an initrd.gz file and an initrd-tree directory
in the /boot/ directory. You can examine and modify files in the initrd-tree/
directory and use mkinitrd to generate a new initrd.gz. Older initrd.gz images
are compressed EXT2 file systems. You may uncompress an initrd.gz file, mount
it as a loop file system, then examine and/or modify its contents. Newer initrd
images are cpio archive files, which can be manipulated by the cpio utility.
Loading initrd image is similar to loading kernel image, only simpler. There
is no specific requirement for the loading address other than a high limit
specified in SETUP. Any reasonable loading address seems to work fine. The
hd-booter loads Linux kernel to 1MB and initrd to 16MB.
After loading the initrd image, the booter must write the loading address
and initrd size to the loaded SETUP at offsets 24 and 28, respctively. Then it
jumps to execute SETUP at 0x9020. Early SETUP code does not care about the
segment register settings. In kernel 2.6, SETUP requires DS=0x9000 upon entry
in order for it to access BOOT as the beginning of its data segment. This
requirement was not well documented. It took me many tries to finally get the
booter to work right.
SAMPLE SOLUTION:
samples/A1/hdbooter.bin
is a bootable FD image for booting Linux bzImage with (or without) initrd
Download and dd to a FD.
Boot from the FD but the booter really works on HDs.