Pages

20181129

seL4 Reference Manual

  • The seL4 microkernel is an operating-system kernel designed to be a secure, safe, and reliable foundation for systems in a wide variety of application domains.
  • As a microkernel, it provides a small number of services to applications, such as abstractions to create and manage virtual address spaces, threads, and inter-process communication (IPC).
  • A limited number of service primitives are provided by the microkernel; more complex services may be implemented as applications on top of these primitives.
  • Threads are an abstraction of CPU execution that supports running software.
  • Address spaces are virtual memory spaces that each contain an application. Applications are limited to accessing memory in their address space.
  • Inter-process communication (IPC) via endpoints allows threads to communicate using message passing.
  • Notifications provide a non-blocking signalling mechanism similar to binary semaphores.
  • Device primitives allows device drivers to be implemented as unprivileged applications. The kernel exports hardware device interrupts  via IPC messages.
  • Capability spaces store capabilities (i.e. access rights) to kernel services along with their book-keeping information.
  • The seL4 microkernel provides a capability-based access-control model. Access control governs all kernel services; in order to perform an operation, an application must invoke a capability in its possession that has sufficient access rights for the requested service.
  • A capability is an unforgeable token that references a specific kernel object (such as a thread control block) and carries rights that control what methods may be invoked.
  • Conceptually, a capability resides in an application's capability space; an address in this space refers to a slot which may or may not contain a capability.
  • Capability spaces are implemented as a directed graph of kernel-managed capability nodes.
  • Capabilities can also be revoked to withdraw authority. Revocation recursively removes any capabilities that have been derived from the original capability being revoked.
  • The seL4 kernel provides a message-passing service for communication between threads. This mechanism is also used for communication with kernel-provided services.
  • Logically, the kernel provides three system calls: send, receive, and yield. However, there are also combinations and variants of the basic send and receive calls.
  • seL4_Send() delivers a message through the named capability and the application to continue.
  • seL4_Recv() is used by a thread to receive messages through endpoints or notifications.
  • seL4_Yield() is the only system call that does not require a capability to be used. It forfeits the remainder of the calling thread's time slice and causes invocation of the kernel's scheduler.
  • CNodes store capabilities, giving a thread permission to invoke methods on particular objects.
  • Thread Control Blocks represent a thread of execution in seL4.
  • Endpoints facilitate message-passing communication between threads.
  • IPC is synchronous: A thread trying to send or receive on an endpoint blocks until the message can be delivered.
  • Notification Objects provide a simple signalling mechanism. A notification is a word-sized array of flags, each of which behaves like a binary semaphore.
  • Virtual Address Space Objects are used to construct a virtual address space for one or more threads.
  • Interrupt Objects give applications the ability to receive and acknowledge interrupts from hardware devices.
  • Untyped Memory is the foundation of memory allocation in the seL4 kernel.
  • The seL4  microkernel does not dynamically allocate memory for kernel objects. Instead, objects must be explicitly created from application-controlled memory regions via Untyped Memory capabilities.
  • There are no arbitrary resource limits in the kernel apart from those dictated by the hardware, and so many denial-of-service attacks via resource exhaustion are avoided.
  • At boot time, seL4 pre-allocates the memory required for the kernel itself, including the code, data, and stack sections (seL4 is a single kernel-stack operating system). It then creates an initial user thread (with an appropriate address and capability space). The kernel then hands all remaining memory to the initial thread in the form of capabilities to Untyped Memory, and some additional capabilities to kernel objects that were required to bootstrap the initial thread.
  • Each user-space thread has an associated capability space (CSpace) that contains the capabilities that the thread possesses, thereby governing which resources the thread can access.
  • A CNode is a table of slots, each of which may contain a capability. This may include capabilities to further CNodes, forming a directed graph.
  • seL4 requires the programmer to manage all in-kernel data structures, including CSpaces, from user-space. This means that the user-space programmer is responsible for constructing CSpaces as well as addressing capabilities within them.
  • Capabilities are managed largely through invoking CNode methods.
  • Some capability types have access rights associated with them. The access rights associated with a capability determine the methods that can be invoked.
  • seL4 supports three orthogonal access rights, which are Read, Write, and Grant.
  • Like a virtual memory address, a capability address is simply an integer. Rather than referring to a location of physical memory (as does a virtual memory address), a capability address refers to a capability slot.
  • The seL4 microkernel provides a message-passing IPC mechanism for communication between threads. The same mechanism is also used for communication with kernel provided services.
  • Messages are sent by invoking a capability to a kernel object.
  • Endpoint capabilities may be minted to create a new endpoint capability with a badge attached to it, a data word chosen by the invoker of the mint operation.
  • Notifications are a simple, non-blocking signalling mechanism that logically represents a set of binary semaphores.
  • A Notification object contains a single data word, called the notification word.
  • seL4 provides threads to represent an execution context and manage processor time. A thread is represented in seL4 by its thread control block object (TCB). Each TCB has an associated CSpace and VSpace which may be shared with other threads.
  • In multi-core machines, threads run on the same CPU which originally created the TCB.
  • seL4 uses a preemptive round-robin scheduler with 256 priority levels. All threads have a maximum controlled priority (MCP) and a priority, the latter being the effective priority of the thread.
  • Each thread has an associated exception-handler endpoint. If the thread causes an exception, the kernel creates an IPC message with the relevant details and sends this to the endpoint. This thread can then take appropriate action.
  • A thread's actions may result in a fault. Faults are delivered to the thread's exception handler so that it can take the appropriate action. The fault type is specified in the message label.
  • User exceptions are used to deliver architecture-defined exceptions.
  • Debug exceptions are used to deliver trace and debug related events to threads.
  • Domains are used to isolate independent subsystems, so as to limit information flow between them. The kernel switches between domains according to a fixed, time-triggered schedule.
  • A thread belongs to exactly one domain, and will only run when that domain is active.
  • A virtual address space in seL4 is called a VSpace.
  • Common to every architecture is the Page, representing a frame of physical memory.
  • A Page object corresponds to a frame of physical memory that is used to implement virtual memory pages in a virtual address space.
  • Page faults are reported to the exception handler of the executed thread.
  • Interrupts are delivered as notifications. A thread may configure the kernel to signal a particular Notification object each time a certain interrupt triggers.
  • IRQHandler capabilities represent the ability of a thread to configure a certain interrupt.
  • Access to I/O ports is controlled by IO Port capabilities. Each IO Port capability identifies a range of ports that can be accessed with it.
  • I/O devices capable of DMA present a security risk because the CPU's MMU is bypassed when the device accesses memory.
  • The seL4 kernel creates a minimal boot environment for the initial thread. This environment consists of the initial thread's TCB, CSpace and VSpace, consisting of frames that contain the user-land image and the IPC buffer.

20181121

Programming Paradigms for Dummies Peter Van Roy


  • More is not better (or worse) than less, just different.
  • Solving a programming problem requires choosing the right concepts.
  • A programming paradigm is an approach to programming a computer based on a mathematical theory or a coherent set of principles.
  • A language should ideally support many concepts in a well-factored way, so that the programmer can choose the right concepts whenever they are needed without being encumbered by the others.
  • The first key property of a paradigm is whether or not it can express observable nondeterminism.
  • We recall that nondeterminism is when the execution of a program is not completely determined by its specification, i.e., at some point during the execution the specification allows the program to choose what to do next.
  • The second key property of a paradigm is how strongly it supports state.
  • State is the ability to remember information, or more precisely, to store a sequence of values in time.
  • Computer programming permits the construction of the most complex systems.
  • A programming language is not designed in a vacuum, but for solving certain kinds of problems.
  • Declarative programming is at the very core of programming languages.
  • Declarative programming will stay at the core for the foreseeable future.
  • Deterministic concurrency is an important form of concurrent programming that should not be ignored.
  • Message-passing concurrency is the correct default for general-purpose concurrent programming instead of shared-state concurrency.
  • The ultimate software-system is one that does not require any human assistance. Such a system is called self-sufficient.
  • Programming paradigms are built out of programming concepts.
  • A record is a data structure: a group of references to data items with indexed access to each item.
  • The record is the foundation of symbolic programming.
  • Many important data structures such as arrays, lists, strings, trees, and hash tables can be derived from records.
  • When combined with closures, records can be used for component based programming.
  • The lexically scoped closure is an enormously powerful concept that is at the heart of programming.
  • From an implementation viewpoint, a closure combines a procedure with its external references.
  • Many abilities normally associated with specific paradigms are based on closures.
  • Component-based programming is a style of programming in which programs are organized as components, where each component may depend on other components.
  • To implement independence we need a new programming concept called concurrency. When two parts do not interact at all, we say they are concurrent.
  • COncurrency is a language concept, and parallelism is a hardware concept.
  • The fundamental difference between processes and threads is how resource-allocation is done.
  • The operating system’s chief role is to arbitrate the resource requests done by all the processes and to allocate resources in a fair way.
  • Despite their popularity, monitors are the most difficult concurrency primitive to program with.
  • State introduces an abstract notion of time in programs.
  • A good rule is that named state should never be invisible: there should always be some way to access it from the outside.
  • The main advantage of named state is that a program can become modular. The main disadvantage is that the program can become incorrect.
  • A user of a data abstraction does not need to understand how the abstraction is implemented.
  • Object-oriented programming, as it is usually understood, is based on data abstraction with polymorphism and inheritance.
  • In computer programming, we see an entity is polymorphic if it can take arguments of different types. This ability is very important for organizing large programs so that the responsibilities of the program’s design are concentrated in well-defined places instead of being spread out over the whole program.
  • Instead of inheritance, we recommend to use composition instead.
  • One of the major problems of concurrent programing is nondeterminism.
  • Debugging and reasoning about programs with race conditions is very difficult.
  • In lazy execution, it is the consumer of a result that decides whether or not to perform a calculation, not the producer.
  • Lazy execution does the least amount of calculation needed to get a result.
  • Decades of research show that parallel programming cannot be completely hidden from the programmer: it is not possible in general to automatically transform an arbitrary program into a parallel program.
  • The best we can do is make parallel programming as easy as possible.
  • Repeated code is a source of errors: if one copy is fixed, all copies have to be fixed.
  • The programming language and its libraries should help not hinder the programmer.
  • Programming languages should support several paradigms because different problems require different concepts to solve them.
  • Each paradigm has its own “soul” that can only be understood by actually using the paradigm.

20181107

Project Oberon by Niklaus Wirth


  • In spite of great leaps forward, hardware is becoming faster more slowly than software is becoming slower.
  • The vast complexity of popular operating systems makes them not only obscure, but also provides opportunities for “back doors”.
  • Mostly thanks to the regularity of the RISC instruction set, the size of the compiler could be reduced significantly.
  • Our programs should be expressed in a manner that makes no reference to machine peculiarities and low-level programming facilities, perhaps with the exception of device interfaces, where dependence is inherent.
  • In order to warrant the sizeable effort of designing and constructing an entire operating system from scratch, a number of basic concepts need to be novel.
  • The fundamental objective of an operating system is to present the computer to the user and to the programmer at a certain level of abstraction.
  • Every abstraction is characterized by certain properties and governed by a set of operations.
  • Every abstraction inherently hides details, namely those from which it abstracts.’
  • High interactivity requires high bandwidth, and the only channel of human users with high bandwidth is the eye. Consequently, the computer’s visual output unit must be properly matched with the human eye.
  • In the Oberon system, the display is partitioned into viewers, also called windows, or more precisely frames, rectangular areas of the screen.
  • High interactivity requires not only a high bandwidth for visual output, it demands also flexibility of input.
  • In Oberon, the notion of a unit of action is separated from the notion of a unit of compilation.
  • One of the rules of what may be called the Oberon programing style is therefore to avoid hidden states, and to reduce the introduction of global variables.
  • We classify Oberon as a single-process (or single-thread) system. [...] Unless engaged in the interpretation of a command, the processor is engaged in a loop continuously polling event sources. This loop is called the central loop; it is contained in module Oberon which may be regarded as the system’s heart.
  • The primary advantage of a system dealing with a single process is that task switches occur at user-defined points only, where no local process state has to be preserved until resumption.
  • The Oberon system features no separate linker. A module is linked with its imports when it is loaded, never before.
  • The high priority given in the system’s conception to modularity, to avoid unnecessary frills, and to concentrate on the indispensable in the core, has resulted in a system of remarkable compactness.
  • We do not consider it as good engineering practice to consume a resource lavishly just because it happens to be cheap.
  • Implementation of a system proceeds bottom-up. Naturally, because modules on higher levels are clients of those on the lower levels and cannot function without the availability of their imports.
  • Description of a system, on the other hand, is better ordered in the top-down direction. This is because a system is designed with its expected applications and functions in mind.
  • Commands in Oberon are explicit, atomic units of interactive operations.
  • It is the generic ability to perform every conceivable task that turns a computing device into a versatile universal tool.
  • Transfers of control between tasks are implemented in Oberon as ordinary calls and returns of ordinary procedures. Preemption is not possible.
  • Interactive tasks are triggered by input data being present, either from the keyboard, the mouse, or other input sources. Background tasks are taken up in a round-robin manner. Interactive tasks have priority.
  • The most important generic function of any operating system is executing programs.
  • Quintessentially, Oberon programs are represented in the form of commands that are in the form of exported parameterless procedures that do not interact with the user of the system.
  • The concept of abstraction is arguably the most important achievement of programming language development.
  • The term loading refers to the transfer of the module code from the file into the main memory, from where the processor fetches individual instructions.
  • The linking process may require a significant amount of address computations.
  • The purpose of the loader is to read object files, and to transform the file representation of modules into their internal image.
  • It is essential that a computer system has a facility for storing data over longer periods of time and for retrieving the stored data. Such a facility is called a file system.
  • A file system must not only provide the concept of a sequence with its accessing mechanism, but also a registry. This implies that files be identified, that they can be given a name by which they are registered and recieved.
  • Experience shows that in practice most files are quite short, i.e. in the order of a few thousand bytes.
  • A crucial property of the Oberon system is centralized resource management.
  • Device drivers are collections of procedures that constitute the immediate interface between hardware and software.
  • Drivers are inherently hardware specific, and the justification of their existence is precisely that they encapsulate these specifics and present to their clients an appropriate abstraction of the device.
  • They keyboard codes received from the keyboard via a PS/2 line are not identical with the character values delivered to the Read procedures. A conversion is necessary. This is so, because modern keyboards treat all keys in the same way, including the ones for upper case, control, alternative, etc. Separate codes are sent to signal the pushing down and the release of a key, followed by another code identifying which key had been pressed or released.
  • Oberon is a single-process system where every command monopolizes the processor until termination.
  • It appears to be a universal law that centralization inevitably calls for an administration.
  • The compiler is the primary tool of the system builder.
  • Compilation of a program text proceeds by analyzing the text and thereby decomposing it recursively into its constructs according to the syntax. When a construct is identified, code is generated according to the semantic rule associated with the construct.
  • The recognition of symbols within a character sequence is called lexical analysis.
  • Procedure bodies are surrounded by by a prolog (entry code) and an epilog (exit code).
  • Besides the parsing of text, the Parser also performs the checking for type consistency of objects.
  • The superiority of a tree structure becomes manifest only when a large number of global objects is declared.
  • Procedure calls cause a sequence of frames to be allocated in a stack fashion. These frames are the storage space for local variables.
  • Static typing is an important principle in programming languages. It implies that every constant, variable or function is of a certain data type, and that this type can be derived by reading the program text without executing it. It is the key principle to introduce important redundancy in languages in such a form that a compiler can detect inconsistencies. It is therefore the key element for reducing the number of errors in programs.
  • Implementation of multiplication in hardware made the operation about 30 times faster than its solution by software.
  • The principal task of the control unit is to generate the address of the next instruction.
  • The system’s core consists of a loop which consistently senses for a command to appear.
  • Make it as simple as possible, but not simpler.
  • Oberon is a general-purpose programming language that evolved from Modula-2. Its principle new feature is the concept of type extension.

20181105

Crafting Interpreters by Bob Nystrom


  • Static type systems in particular require rigorous formal reasoning.
  • Being able to reason precisely and formally about about syntax and semantics is a vital skill when working on a language.
  • For every successful general-purpose language out there, there are a thousand successful niche ones.
  • Implementing a language is a real test of programming skill.
  • You must master recursion, dynamic arrays, trees, graphs, and hash tables.
  • A compiler reads in files in one language and translates them to files in another language. You can implement a compiler in any language, including the same language it compiles, a process called “self-hosting”.
  • C is the perfect language for understanding how an implementation really works, all the way down to the bytes in memory and the code flowing through the CPU.
  • The first step is scanning, or lexing, or (if you’re trying to impress someone) lexical analysis.
  • A scanner takes in a linear stream of characters and chunks them together into a series of something more akin to “words”. In programming languages, each of these words is called a token.
  • The next step is parsing. This is where our syntax gets a grammar--the ability to compose larger expressions and statements out of smaller parts.
  • A parser takes a flat sequence of tokens and builds a tree structure that mirrors the nested nature of the grammar.
  • The first bit of analysis that most languages do is called binding or resolution. For each identifier we find out where that name is defined and wire the two together. This is where scope comes into play--the region of source code where a certain name can be used to refer to a certain declaration.
  • The most powerful bookkeeping tool is to transform the tree into an entirely new data structure that more directly expresses the semantics of the code.
  • You can think of the compiler as a pipeline where each state's job is to organize the code in a way that makes the next stage simpler to implement.
  • If some expression always evaluates to the exact same value we can do the evaluation at compile time and replace the code for the expression with its result.
  • Optimizing is a huge part of the programming language business.
  • Many successful languages have surprisingly few compile-time optimizations.
  • Native code is lightning fast, but generating it is a lot of work.
  • Speaking the chip’s language also means your compiler is tied to a specific architecture.
  • In a fully compiled language, the code implementing the runtime gets inserted directly into the resulting executable.
  • If the language is run inside an interpreter or a VM, then the runtime lives there.
  • Compiling is an implementation technique that involves translating a source language to some other--usually lower-level--form.
  • When we say a language implementation “is a compiler”, we mean it translates source code to some other form but doesn’t execute it.
  • When we say an implementation “is an interpreter”, we mean it takes in source code and executes it immediately.
  • C’s most egregious grammar problems are around types.
  • A static type system is a ton of work to learn and implement.
  • High-level languages exist to eliminate error-prone, low-level drudgery.
  • There are two main techniques for managing memory: reference counting and tracing garbage collection.
  • Where an expression’s main job is to produce a value, a statement’s job is to produce an effect.
  • An argument is an actual value you pass to a function when you call it.
  • A parameter is a variable that holds the value of the argument inside the body of the function.
  • Prototypes are simpler in the language, but they seem to accomplish that only by pushing the complexity onto the user.
  • The idea behind object-oriented programming is encapsulating behavior and state together.
  • If you care about making a language that is actually usable, then handling errors gracefully is vital.
  • It’s a good engineering practice to separate the code that generates the errors from the code that reports them.
  • Our job is to scan through the list of characters and group them together in the smallest sequences that still represent something. Each of these blobs of characters is called a lexeme.
  • An interactive prompt is also called a REPL.
  • That’s a token: a bundle containing the raw lexeme along with the other things the scanner learned about it.
  • The core of the scanner is a loop. Starting at the first character of the source code, it figures out what lexeme it belongs to, and consumes it and any following characters that are part of that lexeme. When it reaches the end of that lexeme, it emits a token.
  • Maximal munch: When two grammar rules can both match match a chunk of code that the scanner is looking at, whichever one matches the most characters wins.
  • Regular expressions aren’t powerful enough to handle expressions which can nest arbitrarily deeply.
  • A formal grammar takes a set of atomic pieces it calls its “alphabet”. Then it defines a (usually infinite) set of “strings” that are “in” the grammar. Each string is a sequence of “letters” in the alphabet.
  • A formal grammar’s job is to specify which strings are valid and which aren’t.
  • The visitor pattern is really about approximating the functional style within an OOP language.
  • Unlike overriding, overloading is statically dispatched at compile time.
  • When we debug our parser and interpreter, it’s often useful to look at a parsed syntax tree and make sure it has the structure we expect.
  • Converting a tree to a string is sort of the opposite of a parser, and is often called “pretty printing” when the goal is to produce a string of text that is valid syntax in the source language.
  • Writing a real parser--one with decent error-handling, a coherent internal structure, and the ability to robustly chew through a sophisticated syntax--is considered a rare, impressive skill.
  • Precedence determines which operator is evaluated first in an expression containing a mixture of different operators.
  • Associativity determines which operator is evaluated first in a series of the same operator.
  • Recursive descent is the simplest way to build a parser, and doesn’t require using complex parser generator tools.
  • Recursive descent parsers are fast, robust, and can support sophisticated error handling.
  • A parser really has two jobs:
    • Given a valid sequence of tokens, produce a corresponding syntax tree.
    • Given an invalid sequence of tokens, detect any errors and tell the user about their mistakes.
  • Syntax errors are a fact of life and language tools have to be robust in the face of them.
  • The way a parser responds to an error and keeps going to look for later errors is called “error recovery”.
  • The traditional place in the grammar to synchronize is between statements.
  • Taking advantage of what users already know is one of the most powerful tools you can use to ease adoption of your language. It’s almost impossible to underestimate how useful this is.
  • A literal is a bit of syntax that produces a value.
  • Once you misinterpret bits in memory, all bets are off.
  • While a runtime error needs to stop evaluating the expression, it shouldn’t kill the interpreter. If a user is running the REPL and has a typo in a line of code, they should still be able to keep going and enter more code after that.
  • Some language are statically typed which means type errors are detected and reported a compile type before any code is run.
  • Others are dynamically typed and defer checking for type errors until runtime right before an operation is attempted.
  • A key reason users choose statically typed languages is because of the confidence the language gives them that certain kinds of errors can never occur when their program is run.
  • To support bindings, our interpreter needs internal state.
  • State and statements go hand in hand. Since statements, by definition, don’t evaluate to a value, they need to do something else to be useful. That something is called a “side effect”.
  • A single token lookahead recursive descent parser can’t see far enough to tell that it’s parsing an assignment until after it has gone through the left-hand side and stumbled onto the =.
  • The key difference between assignment and definition is that assignment is not allowed to create a new variable.
  • A scope is a region where a name maps to a certain entity. Multiple scopes enable the same name to refer to different things in different contexts.
  • Lexical scope is a specific style of scope where the text of the program itself shows where a scope begins and ends.
  • This is in contrast with dynamic scope where you don’t know what a name refers to until you execute the code.
  • One motivation for lexical scope is encapsulation--a block of code in one corner of the program shouldn’t interfere with some other one.
  • When a local variable has the same name as a variable in an enclosing scope, it shadows the outer one. Code inside the block can’t see it anymore, but it’s still there.
  • The main advantage to implicit declaration is simplicity.
  • In fact, any programming language with some minimum level of expressiveness is powerful enough to compute any computable function.
  • Reducing syntactic sugar to semantically equivalent but more verbose forms is called desugaring.
  • A rich syntax makes the language more pleasant and productive to work in.
  • Arity is the fancy term for the number of arguments a function or operation expects.
  • When it comes to making your language actually good at doing useful stuff, the native functions your implementation provides are key. They provide access to the fundamental services that all programs are defined in terms of.
  • Many languages also allow users to provide their own native functions. The mechanism for doing so is called a foreign function interface (FFI), native extension, native interface, or something along those lines.
  • By using different environments when we execute the body, calls to the same function with the same code can produce different results.
  • When a function is declared, it captures a reference to the current environment.
  • A closure retains a reference to the environment instance in play when the function was declared.
  • There are three broad paths to object-oriented programming: classes, prototypes, and multimethods.
  • Doing a hash table lookup for every field access is fast enough for many language implementations, but not ideal.
  • Methods and fields let us encapsulate state and behavior together so that an object always stays in a valid configuration.
  • Prototypes are simpler than classess--less code for the language implementer to write, and fewer concepts for the user to learn and understand.
  • Breadth is the range of different things the language lets you express.
  • Ease is how little effort it takes to make the language do what you want.
  • Complexity is how big the language is.
  • INheriting from another class means that everything that’s true of the superclass should be true, more or less, of the subclass.
  • A tree-walk interpreter is fine for some kinds of high-level, declarative languages. But for a general-purpose, imperative language--even a “scripting” language, it won’t fly.
  • A dynamically-typed language is never going to be as fast as a statically-typed language with manual memory management.
  • In engineering, few choices are without trade-offs.
  • Modern CPUs process data way faster than they can pull it from RAM. To compensate for that, chips have multiple layers of caching.
  • Many implementations of malloc() store the allocated size in memory right before the returned address.
  • C asks us not just to manage memory explicit, but mentally.
  • A trie stores a set of strings.
  • Tries are a special case of an even more fundamental data structure: a deterministic finite automata.
  • We sometimes fall into the trap of thinking that performance comes from complicated data structures, layers of caching, and other fancy optimizations. But, many times, all that’s required is to do less work, and I often find that writing the simplest code I can is sufficient to accomplish that.
  • Bytecode was good enough for Niklaus Wirth.
  • Pratt parsers are a sort of oral tradition in industry.
  • Vaughn Pratt’s “top-down operator precedence parsing” is the most elegant way I know to parse expressions.
  • A compiler has roughly two jobs. It parses the user’s source code to understand what it means. Then it takes that knowledge and outputs low-level instructions that produce the same semantics.
  • Good error handling and reporting is more valuable to users than almost anything else you can put time into in the front end.
  • You wouldn’t believe the breadth of software problems that miraculously seem to require a new little language in their solution as soon as you ask a compiler hacker for help.
  • The nice thing about working in C is that we can build our data structures from the raw bits up.
  • A value contains two parts: a “type” tag, and a payload for the actual value.
  • A union looks like a struct except that all of its fields overlap in memory.
  • The size of a union is the size of its largest field.
  • Most architectures prefer values to be aligned to their size.
  • A bytecode VM spends much of its execution time reading and decoding instructions. The fewer, simpler instructions you need for a given piece of behavior, the faster it goes.
  • There’s no maximum length for a string.
  • We need a way to support values whos size varies, sometimes greatly. This is exactly what dynamic allocation on the heap is designed for.
  • The [garbage] collector must ensure it it can find every bit of memory that is still being used so that it doesn’t collect live data.
  • If your language needs GC, get it working as soon as you can.
  • Choosing a character representation and encoding involves fundamental trade-offs.