/ TUTORIALS

MIT 6.828 Lab 1 : C,Assembly,Tools,and Bootstrapping

Introduction

This lab is split into three parts:

  1. Getting familiarize with x86 assembly language, the QEMU x86 emulator, and the PC’s power-on bootstrap procedure.

  2. Examines the boot loader for out 6.828 kernel.

  3. The 6.828 kernel.

P.S. : The URL for the course Git repository is https://pdos.csail.mit.edu/6.828/2017/jos.git,
and you can run git clone https://pdos.csail.mit.edu/6.828/2017/jos.git to get the lab files.

Part 1: PC Bootstrap

  1. Getting Started with x86 assembly

    Exercise 1.

    • Familiar with x86 assembly language:
      PC Assembly Language Book (warning: the examples in the book are written for the NASM assemble which uses the Intel syntax, but we will use the GNU assemble which uses the AT&T syntax.)
      Brennan’s Guide to Inline Assembly (This book gives a good and quite brief description of the AT&T assembly syntax.)
  2. Simulating the x86

    • Get into the /jos directory, then run make to build the minimal 6.828 boot loader and kernel.

       + as kern/entry.S
        + cc kern/entrypgdir.c
        + cc kern/init.c
        + cc kern/console.c
        + cc kern/monitor.c
        + cc kern/printf.c
        + cc kern/kdebug.c
        + cc lib/printfmt.c
        + cc lib/readline.c
        + cc lib/string.c
        + ld obj/kern/kernel
        + as boot/boot.S
        + cc -Os boot/main.c
        + ld boot/boot
        boot block is 380 bytes (max 510)
        + mk obj/kern/kernel.img
      
    • The file obj/kern/kernel.img created above is the contents of the emulated PC’s “virtual hard disk.” This hard disk images contains both our boot loader(obj/boot/boot) and our kernel (obj/kernel). Then run make qemu to running our kernel:

       Booting from Hard Disk...
        6828 decimal is XXX octal!
        entering test_backtrace 5
        entering test_backtrace 4
        entering test_backtrace 3
        entering test_backtrace 2
        entering test_backtrace 1
        entering test_backtrace 0
        leaving test_backtrace 0
        leaving test_backtrace 1
        leaving test_backtrace 2
        leaving test_backtrace 3
        leaving test_backtrace 4
        leaving test_backtrace 5
        Welcome to the JOS kernel monitor!
        Type 'help' for a list of commands.
        K>
      
    • And there are only two commands you can give to the kernel monitor, help and kerninfo:

       K> help
        help - display this list of commands
        kerninfo - display information about the kernel
        K> kerninfo
        Special kernel symbols:
        entry  f010000c (virt)  0010000c (phys)
        etext  f0101a75 (virt)  00101a75 (phys)
        edata  f0112300 (virt)  00112300 (phys)
        end    f0112960 (virt)  00112960 (phys)
        Kernel executable memory footprint: 75KB
        K>
      
  3. The PC’s Physical Address Space

    • A PC’s physical address space is hard-wired(硬连接) to have the following general layout:

       +------------------+  <- 0xFFFFFFFF (4GB)
        |      32-bit      |
        |  memory mapped   |
        |     devices      |
        |                  |
        /\/\/\/\/\/\/\/\/\/\
      
        /\/\/\/\/\/\/\/\/\/\
        |                  |
        |      Unused      |
        |                  |
        +------------------+  <- depends on amount of RAM
        |                  |
        |                  |
        | Extended Memory  |
        |                  |
        |                  |
        +------------------+  <- 0x00100000 (1MB)
        |     BIOS ROM     |
        +------------------+  <- 0x000F0000 (960KB)
        |  16-bit devices, |
        |  expansion ROMs  |
        +------------------+  <- 0x000C0000 (768KB)
        |   VGA Display    |
        +------------------+  <- 0x000A0000 (640KB)
        |                  |
        |    Low Memory    |
        |                  |
        +------------------+  <- 0x00000000
      
    • 第一代 PC 基于 16 位 Intel 8080 处理器,且只有 1MB 的寻址能力,即从 0x00000000 开始,到 0x000FFFFF 结束。如图所示,从 0x00000000 开始的 640KB 分配给 Low Memory/Conventional Memory,从 0x000A00000x000FFFFF 的 384KB 则是保留给显示器缓冲区等做特殊用途,其中最重要的部分是 Basic Input/Output System,它占用了从 0x000F00000x000FFFFF 的 64KB 空间。

    • BOIS 负责完成最基础的系统初始化工作,类似于激活显卡,检查存储空间等等。在完成初始化之后,BIOS 从磁盘中加载出操作系统并把计算机的控制权转交给操作系统。

    • 当 Intel 突破了 1MB 内存的限制后,PC 开始支持 16MB 和 4GB 的物理内存。所以,现代PC把内存划分为 “Low Memory”(和之前一样) 和 “Extended Memory”(所有其他的东西)。另外,还有一些空间被 BIOS 保留给了 PCI 设备。

  4. The Rom BIOS

    • Using QEMU to investigate how an IA-32 compatible computer boots.

    • steps :

      • open two terminal windows and cd both shells into the lab directory.
      • In one, run make qemu-gdb or make qemu-nox-gdb. This start up QEMU, but QEMU stops just before the processor executes the first instruction and waits for a debugging connection from GDB.
      • In the second terminal, run gdb and we can see this:

         GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1
          Copyright (C) 2014 Free Software Foundation, Inc.
          License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
          This is free software: you are free to change and redistribute it.
          There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
          and "show warranty" for details.
          This GDB was configured as "i686-linux-gnu".
          Type "show configuration" for configuration details.
          For bug reporting instructions, please see:
          <http://www.gnu.org/software/gdb/bugs/>.
          Find the GDB manual and other documentation resources online at:
          <http://www.gnu.org/software/gdb/documentation/>.
          For help, type "help".
          Type "apropos word" to search for commands related to "word".
          + target remote localhost:25000
          warning: A handler for the OS ABI "GNU/Linux" is not built into this configuration
          of GDB.  Attempting to continue with the default i8086 settings.
        
          The target architecture is assumed to be i8086
          [f000:fff0]    0xffff0:	ljmp   $0xf000,$0xe05b
          0x0000fff0 in ?? ()
          + symbol-file kernel
          (gdb) 
        

    Exercise 2.

    • Using GDB’s si(step instruction) command to trace into the ROM BIOS, and try to have the general idea of what the BIOS is doing first.
    • 简述一下 PC 启动的流程:

      • PC 启动时, CS = 0xf000, IP = 0xfff0,即从物理地址 0x000ffff0 开始执行第一条指令
      • 第一条指令是 ljmp $0xf000, $0xe05b, 即跳转到分段地址 CS = 0xf000, IP = 0xe05b 。因为 BIOS 末端地址为 0x100000 , 所以刚开始的地址离结束地址只有 16 个字节,存放不了什么指令,因此第一条指令才是跳转指令,跳转到 BIOS 刚开始的地方。
      • 当 BIOS 开始 执行时,它先创建一个中断描述符表(IDT),并初始化各种设备和 PCI 总线。
      • 最后,BIOS 开始搜索可引导设备,并读取引导加载程序并把计算机控制权转交给它。
    • 实模式地址:

      • 在实模式(即 PC 刚启动时),地址换算的公式为:
        $physical\ address = 16 * segment + offset$.
        所以,当 PC 将 CS(code segment register) 置为 0xf000IP(instruction pointer) 置为 0xfff0,物理地址为:
        $16 * 0xf000 + 0xfff0$
        $= 0xf0000 + 0xfff0$
        $= 0xffff0$
      • 实模式只有 16 位的寻址能力,它将内存看成分段的区域,程序段和数据位于不同的区域,每一个指针都指向实际的物理地址。
      • 可以通过修改 A20 总线来完成从实模式到保护模式的转换。

Part 2: The Boot Loader

  1. Boot Loader

    电脑的硬盘或软盘通常被划分为 512 字节一块的扇形区(sector)。扇形区是磁盘 IO 的最小粒度:每次读写至少包含一个或者多个扇形区。如果一个磁盘是可引导的,则它的第一块扇形区为 引导扇区(boot sector) ,里面存放着引导加载程序。当 BIOS 找到一个可引导的磁盘,它会先加载引导扇区到到内存中,位于物理地址 0x7c000x7dff 的位置。然后执行 jmp 指令将 CS:IP 设置为 0000:7c00,再转交控制权。

    在 xv6 中,boot loader 由一个汇编程序 boot/boot.s 和一个 C 语言文件 boot/main.c 组成。boot loader 主要执行这两个函数:

    • 首先,boot loader 将处理器从实模式切换到 32 位的保护模式,因为只有在这个模式下,软件才能使用处理器物理地址空间中 1MB 以上的空间。在保护模式中, 从分段地址(segmented address)到物理地址的映射和实模式不同,在切换后,偏移量(offset)从 16bit 变成了 32bit

    • 然后,boot loader 从磁盘中直接读取内核。

    在理解了 xv6 的 boot loader 源码后,可以去看看它的反汇编文件 obj/boot/boot.asm,同样还有内核的反汇编文件 obj/kern/kernel.asm
    通过 b 指令可以在 GDB 中设置断点,例如 b *0x7c00 ;在断点后,可以通过 csi 指令来继续执行下一个指令,si N 可以选择跳到之后的第 N 个指令处。
    通过 x/i 指令可以用来测试内存中的指令。

    Exercise 3.

    • Set a breakpoint at address 0x7c00, which is where the boot sector will be loaded. Continue execution until that breakpoint. Trace through the code in boot/boot.S, using the source code and the disassembly file obj/boot/boot.asm to keep track of where you are. Also use the x/i command in GDB to disassemble sequences of instructions in the boot loader, and compare the original boot loader source code with both the disassembly in obj/boot/boot.asm and GDB.
    • Trace into bootmain() in boot/main.c, and then into readsect(). Identify the exact assembly instructions that correspond to each of the statements in readsect(). Trace through the rest of readsect() and back out into bootmain(), and identify the begin and end of the for loop that reads the remaining sectors of the kernel from the disk. Find out what code will run when the loop is finished, set a breakpoint there, and continue to that breakpoint. Then step through the remainder of the boot loader.

    Question 1 : At what point does the processor start executing 32-bit code? What exactly causes the switch from 16- to 32-bit mode?

    • Answer : 查看 boot.asm 文件,可以发现:

       # Jump to next instruction, but in 32-bit code segment.
        # Switches processor into 32-bit mode.
        ljmp    $PROT_MODE_CSEG, $protcseg
            7c2d:	ea 32 7c 08 00 66 b8 	ljmp   $0xb866,$0x87c32
      
        00007c32 <protcseg>:
      
        .code32                     # Assemble for 32-bit mode
      

      因此可以得出:从 0x00007c32 处开始执行 32位模式的代码,ljmp $0xb866,$0x87c32 完成了从实模式到保护模式的切换。

    Question 2 : What is the last instruction of the boot loader executed, and what is the first instruction of the kernel it just loaded?

    • Answer : 查看 boot.asm 文件,在函数 void bootmain(void) 中可以发现 boot loader 执行的最后一条指令:

       // call the entry point from the ELF header
        // note: does not return!
        ((void (*)(void)) (ELFHDR->e_entry))();
        7d61:	ff 15 18 00 01 00    	call   *0x10018
      

      kernel.asm 中可以找到 kernel 执行的第一条指令:

       entry:
        movw	$0x1234,0x472			# warm boot
      

    Question 3 : Where is the first instruction of the kernel?

    • Answer : 在 obj/kern/ 目录下,使用 objdump -f kernel 命令可以查看反汇编文件的具体信息:

       kernel:     文件格式 elf32-i386
        体系结构:i386, 标志 0x00000112:
        EXEC_P, HAS_SYMS, D_PAGED
        起始地址 0x0010000c
      

      可以看出 kernel 的第一条指令的地址为 0x0010000c

    Question 4 : How does the boot loader decide how many sectors it must read in order to fetch the entire kernel from disk? Where does it find this information?

    • Answer : 是通过 ELF 文件头获取所有 program header table,每个 program header table 记录了三个重要信息用以描述段 (segment):
      • p_pa (物理内存地址)
      • p_memsz (所占内存大小)
      • p_offset (相对文件的偏移地址)
        根据这三个信息,对每个段,从 p_offset 开始,读取 p_memsz 个 byte 的内容(需要根据扇区(sector)大小对齐),放入 p_pa 开始的内存中。通过 objdump -p kernel 命令可以查看:
       kernel:     文件格式 elf32-i386
      
        程序头:
            LOAD off    0x00001000 vaddr 0xf0100000 paddr 0x00100000 align 2**12
                filesz 0x0000716c memsz 0x0000716c flags r-x
            LOAD off    0x00009000 vaddr 0xf0108000 paddr 0x00108000 align 2**12
                filesz 0x0000a300 memsz 0x0000a944 flags rw-
            STACK off    0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**4
                filesz 0x00000000 memsz 0x00000000 flags rwx
      
  2. Loading the Kernel

    We will now look in further detail at the C language portion of the boot loader, in boot/main.c.

    Exercise 4.

    • Read about programming with pointers in C. The best reference for the C language is “K&R”
    • 什么是 ELF ?
      在编译和链接一个 C 语言程序时,编译器先从 .c 源代码文件中生成 .o 目标文件(二进制文件),链接器再结合所有已编译的目标文件,生成 ELF 格式的二进制文件, i.e “Executable and Linkable Format”.
      一个 ELF 可执行文件通常由一个定长的带有加载信息的头部(header)和几个变长的带有代码或数据段的程序部分(program sections)组成:

      • .text : 程序的可执行指令
      • .rodata : 只读数据,例如 C 编译器产生的 ASCII 字符串常量
      • .data: : 数据段存储着程序的初始化变量,例如 声明且已初始化的全局变量

      在链接器计算程序内存分布时,它会为未初始化的全局变量保留空间,并存储在 .data段后的 .bss 段中,C 语言会为未初始化的全局变量赋值为 0.

    • 几个命令:

      • objdump -h obj/kern/kernel : Examine the full list of the names, sizes, and link addresses of all the sections in the kernel executable.
      • objdump -h obj/boot/boot.out : look at the .text section of the boot loader.
      • objdump -x obj/kern/kernel : inspect the program headers.

Summary

fighting.