Justin Spencer: xv6: a simple, Unix-like teaching operating system

The job of an operating system is to share a computer among multiple programs and to provide a more useful set of services than the hardware alone supports.
An operating system provides services to user programs through an interface.
Each running program, called a process, has memory containing instructions, data, and a stack. The instructions implement the program's computation. The data are the variables on which the computation acts. The stack organizes the program's procedure calls.
When a process needs to invoke a kernel service, it invokes a procedure call in the operating system interface. Such a procedure call is called a system call. The system call enters the kernel; the kernel performs the service and returns.
The kernel uses the CPU's hardware protection mechanisms to ensure that each process executing in user space can access only its own memory.
The shell is an ordinary program that reads commands from the user and executes them, and is the primary user interface to traditional Unix-like systems.
An xv6 process consists of user-space memory (instructions, data, and stack) and per-process state private to the kernel.
The exec system call replaces the calling process's memory with a new memory image loaded from a file stored in the file system.
A process that needs more memory at run-time (perhaps for malloc) can call sbrk(n) to grow its data memory by n bytes; sbrk returns the location of the new memory.
A file descriptor is a small integer representing a kernel-managed object that a process may read from or write to.
Internally, the xv6 kernel uses the file descriptor as an index into a per-process table, so that every process has a private space of file descriptors starting at zero.
By convention, a process read from file descriptor 0 (standard input), writes to file descriptor 1 (standard output), and writes error messages to file descriptor 2 (standard error).
The read and write system calls read bytes from and write bytes to open files named by file descriptors.
Each file descriptor that refers to a file has an offset associated with it. Read reads data from the current file offset and then advances that offset by the number of bytes read: a subsequent read will return the bytes following the ones returned by the first read.
Like read, write writes data at the current file offset and then advances that offset by the number of bytes written: each write picks up where the previous one left off.
A newly allocated file descriptor is always the lowest-numbered unused descriptor of the current process.
Fork copies the parent's file descriptor table along with its memory, so that the child starts with exactly the same open files as the parent.
The dup system call duplicates an existing file descriptor, returning a new one that refers to the same underlying I/O object.
File descriptors are a powerful abstraction, because they hide the details of what they are connected to: a process writing to file descriptor 1 may be writing to a file, to a device like the console, or to a pipe.
A pipe is a small kernel buffer exposed to processes as a pair of file descriptors, one for reading and one for writing. Writing data to one end of the pipe makes that data available for reading from the other end of the pipe.
Pipes provide a way for processes to communicate.
Pipes allow for synchronization: two processes can use a pair of pipes to send messages back and forth to each other, with each read blocking its calling process until the other process has sent data with write.
A file's name is distinct from the file itself; the same underlying file, called an inode, can have multiple names, called links.
The link system call creates another file system name referring to the same inode as an existing file.
Each inode is identified by a unique inode number.
Unix's combination of the "standard" file descriptors, pipes, and convenient shell syntax for operations on them was a major advance in writing general-purpose reusable programs.
The authors of Unix went on to build Plan 9, which applied the "resources are files" concept to modern facilities, representing networks, graphics, and other resources as files or file trees.
Any operating system must multiplex processes onto the underlying hardware, isolate processes from each other, and provide mechanisms for controlled inter-process communication.
The implementation of an operating system must achieve three requirements: multiplexing, isolation, and interaction.
To achieve strong isolation a helpful approach is to disallow applications to have direct access to the hardware resources, but instead to abstract the resources into services.
In kernel mode, the processor is allowed to execute privileged instructions.
The software running in kernel space (or in kernel mode) is called the kernel.
A key design question for an operating system is what part of the operating system should run in kernel mode.
The unit of isolation in xv6 is a process.
A process is an abstraction that provides the illusion to a program that it has its own abstract machine.
The x86 page table translates (or "maps) a virtual address (the address that an x86 instruction manipulates) to a physical address (an address that the processor chip sends to main memory).
The xv6 kernel maintains many pieces of state for each process, which it gathers into a struct proc. A process's most important pieces of kernel state are its page table, its kernel stack, and its run state.
Each process has two stacks: a user stack and a kernel stack.
When a process makes a system call, the processor switches to the kernel stack, raises the hardware privilege level, and starts executing the kernel instructions that implement the system call. When the system call completes, the kernel returns to user space: the hardware lowers its privilege level, switches back to the user stack, and resumes executing user instructions just after the system call instruction.
Page tables are the mechanism through which the operating system controls what memory addresses mean.
An x86 page table is logically an array of 2^20 page table entries (PTEs). Each PTE contains a 20-bit physical page number (PPN) and some flags. The paging hardware translates a virtual address by using its top 20 bits with the PPN in the PTE. The paging hardware copies the low 12 bits unchanged from the virtual to the translated physical address.
A page table is stored in physical memory as a two-level tree. [...] This two-level structure allows a page table to omit entire page table pages in the common case in which large ranges of virtual addresses have no mappings.
Physical memory refers to storage cells in DRAM. A byte of physical memory has an address, called a physical address. Instructions use only virtual addresses, which the paging hardware translates to physical addresses, and then sends to the DRAM hardware to read or write storage.
Each process has a separate page table, and xv6 tells the page table hardware to switch page tables when xv6 switches between processes.
Different processes' page tables translate user addresses to different pages of physical memory, so that each process has private user memory.
To guard a stack growing off the stack page, xv6 places a guard page right below the stack. The guard page is not mapped and so if the stack runs off the stack page, the hardware will generate an exception because it cannot translate the faulting address.
Exec is the system call that creates the user part of an address space. It initializes the user part of an address space from a file stored in the file system.
If the ELF header has the right magic number, exec assumes that the binary is well-formed.
Small pages make sens when physical memory is small, to allow allocation and page-out to disk with fine granularity.
Larger pages make sense on machines with lots of RAM, and may reduce overhead for page-table manipulation.
Today people care more about speed than space-efficiency.
When running a process, a CPU executes the normal processor loop: read an instruction, advance the program counter, execute the instruction, repeat.
The term exception refers to an illegal program action that generates an interrupt.
The term interrupt refers to a signal generated by a hardware device, indicating that it needs attention of the operating system.
The kernel handles all interrupts, rather than processes handling them, because in most cases only the kernel has the required privilege and state.
An interrupt stops the normal processor loop and starts executing a new sequence called an interrupt handler.
It is important to remember that traps are caused by the current process running on a processor, and interrupts are caused by devices and may not be related to the currently running process.
The x86 has 4 protection levels, numbered 0 (most privileged) to 3 (least privilege). In practice, most operating systems use only 2 levels: 0 and 3, which are then called kernel mode and user mode, respectively.
The current privilege level with which the x86 executes instructions is stored in the %cs register, in the field CPL.
On the x86, interrupt handlers are defined in the interrupt descriptor table (IDT). The IDT has 256 entries, each giving the %cs and %eip to be used when handling the corresponding interrupt.
To make a system cal on the x86, a program invokes the int n instruction, where n specifies the index into the IDT.
An operating system can use the iret instruction to return from an int instruction. It pops the saved values during the int instruction from the stack, and resumes execution at the saved %eip.
System calls conventionally return negative numbers to indicate errors, positive numbers for success.
Interrupts are similar to system calls, except devices generate them at any time.
A processor can control if it wants to receive interrupts through the IF flag in the eflags register. The instruction cli disables interrupts on the process by clearing IF, and sti enables interrupts on a processor.
xv6 disables interrupts during booting of the main cpu and the other processors.
A driver is the piece of code in an operating system that manages a particular device: it provides interrupt handlers for a device, causes a device to perform operations, cases a device to generate interrupts, etc.
Driver code can be tricky to write because a driver executes concurrently with the device that it manages.
In many operating systems, the drivers together account for more code in the operating system than the core kernel.
Typically devices are slower than CPU, so the hardware uses interrupts to notify the operating system of status changes.
Using DMA means that the CPU is not involved at all in the [data] transfer, which can be more efficient and is less taxing for the CPU's memory caches.
All modern devices are programmed using memory-mapped I/O.
A lock provides mutual exclusion, ensuring that only one CPU at a time can hold the lock.
In one atomic operation, xchg swaps a word in memory with the contents of a register.
Interrupts can cause concurrency even on a single processor: if interrupts are enabled, kernel code can be stopped at any moment to run an interrupt handler instead.
It is possible to implement locks without atomic instructions, but it is expensive, and most operating systems use atomic instructions.
Any operating system is likely to run with more processes than the computer has processors, and so a plan is needed to time-share the processors among the processes.
Switching from one thread to another involves saving the old thread's CPU registers, and restoring previously-saved registers of the new thread; the fact that %esp and %eip are saved and restored means that the CPU will switch stacks and switch what code it is executing.
Each pipe is represented by a struct pipe, which contains a lock and a data buffer.
The xv6 scheduler implements a simple scheduling policy, which runs each process in turn. This policy is called round robin.
A semaphore is an integer value with two operations, increment and decrement (or up and down). It is always possible to increment a semaphore, but the semaphore value is not allowed to drop below zero.
The purpose of a file system is to organize and store data.
One of the most interesting problems in file system design is crash recovery. The problem arises because many file system operations involve multiple writes to the disk, and a crash after a subset of the writes may leave the on-disk file system in an inconsistent state.
One of the cool aspects of the Unix interface is that most resources in Unix are represented as a file, including devices such as the console, pipes, and of course, real files. The file descriptor layer is the layer that achieves this uniformity.
All the open files in the system are kept in a global file table, the ftable.
From our point of view, we can abstract the PC into three components: CPU, memory, and input/output (I/O) devices. The CPU performs computation, the memory contains instructions and data for that computation, and devices allow the CPU to interact with hardware for storage, communication, and other functions.
A computer's CPU runs a conceptually simple loop: it consults an address in a register called the program counter, reads a machine instruction from that address in memory, advances the program counter past the instruction, and executes the instruction. Repeat.
A register is a storage cell inside the processor itself, capable of holding a machine word-sized value.
The modern x86 provides eight general purpose 32-bit registers: eax, ebx, ecx, edx, edi, esi, ebp, and esp--and a program counter eip.
Registers are fast but expensive.
Main memory is 10-100x slower than a register, but it is much cheaper, so there can be more of it. One reason main memory is relatively slow is that it is physically separate from the processor chip.
The x86 processor provides special in and out instructions that read and write values from device address called I/O ports.

Justin Spencer

Pages

20181205

xv6: a simple, Unix-like teaching operating system

No comments:

Post a Comment