Justin Spencer: Compiler Construction by Niklaus Wirth

It is the essence of any academic education that not only knowledge, and, in the case of an engineering education, know-how is transmitted, but also understanding and insight.
very academically educated computer scientist must know ho w a computer functions, and must understand that ways and methods in which programs are represented and interpreted.
Computer programs are formulated in a programming language and specify classes of computing processes. Computers, however, interpret sequences of particular instructions, but not program texts. Therefore, the program text must be translated into a suitable instruction sequence before it can be processes by a computer. This translation can be automated, which implies that it can be formulated as a program itself. The translation program is called a compiler, and the text to be translated is called source text (or sometimes source code).
The translation process essentially consists of the following parts:

The sequence of characters of a source text is translated into a corresponding sequence of symbols of the vocabulary of the language.
The sequence of symbols is transformed into a representation that mirrors the syntactic structure of the source text and lets this structure easily be recognized.
In addition to syntactic rules, compatibility rules among types of operators and operands define the language. Hence, verification of whether these compatibility rules are observed by a program is an additional duty of a compiler. This verification is called type checking.
On the basis of the representation resulting from step 2, a sequence of instructions taken from the instruction set of the target computer is generated. This phase is called code generation.

A partitioning of the compilation process into as many parts as possible was the predominant technique until about 1980, because until then the available store was too small to accommodate the entire compiler. Only individual compiler parts would fit, and they could be loaded one after the other in sequence. The parts were called passes, and the whole was called a multi pass compiler.
Modern computers with their apparently unlimited stores make it feasible to avoid intermediate storage to disk. And with it, the complicated process of serializing a data structure for output, and its reconstruction on input can be discarded as well. With single-pass compilers, increases in speed by factors of several thousands are therefore possible. Instead of being tackled one after another in strictly sequential fashion, the various parts (tasks) are interleaved.
A compiler which generates code for a computer different from the one executing the compiler is called a cross compiler. The generated code is then transferred--downloaded--via a data transmission line.
Every language displays structure called its grammar or syntax.
A language is, therefore, the set of sequences of terminal symbols which, starting with the start symbol, can be generated by repeated application of syntactic equations, that is, substitutions.
A language is regular, if its syntax can be expressed by a single EBNF expression.
The reason for our interest in regular languages lies in the fact that programs for the recognition of regular sentences are particularly simple and efficient.
Sentence recognition is called syntax analysis.
Ultimately, the basic idea behind every language is that it should serve as a means for communication. This means that partners must use and understand the same language. Promoting the ease by which a language can be modified and extended may therefore by rather counterproductive.
The scanner has to recognize terminal symbols in the source code.
The essential characteristics of a good compiler, regardless of details, are that (1) no sequence of symbols leads to its crash, and (2) frequently encountered errors are correctly diagnosed and subsequently generate no, or few additional, spurious error messages.
The simplest form of data structure for representing a set of items is the list. Its major disadvantage is a relatively slow search processes, because it has to be traversed from its root to the desired element.
In languages featuring data types, their consistency checking is one of the most important tasks of a compiler.
Consecutively declared variables are then allocated with monotonically increasing or decreasing addresses. This is called sequential allocation.
The size of an array is its element size multiplied by the number of its elements. The address of an element is the sum of the array's address and the element's index multiplied by the element size.
Absolute addresses of variables are usually unknown at the time of compilation. All generated addresses must be considered as relative to a common base address which is given at run-time. The effective address is then the sum of this base address and the address determined by the compiler.
Although bytes can be accessed individually, typically a small number of bytes (say 4 or 8) are transferred from or to memory as a packet, a so-called word.
Following a longstanding tradition, addresses of variables are assigned negative values, that is, negative offsets to the common base address determined during program execution.
The acronym RISC stands for reduced instruction set computer, where "reduced" is to be understood as relative to architectures with large sets of complex instructions, as these were dominant until about 1980.
From the view points of the programmer and the compiler designer the computer consists of an arithmetic unit, a control unit, and a store.
Whenever arithmetic expressions are evaluated, the inherent danger of overflow exists. The evaluating statements should therefore be suitable guarded.
The essence of delayed code generation is that code is not emitted before it is clear that no better solution exists.
Conditional and repeated statements are implemented with the aid of jump instructions, also called branch instructions.
Procedures, which are also known as subroutines, are perhaps the most important tool for structuring programs.
Addresses of local variables generated by the compiler are always relative to the base address of the respective activation frame.
Parameters constitute the interface between the calling and the called procedures. Parameters on the calling side are said to be actual parameters, and those on the called side formal parameters.
An open array is an array parameter whose length is known (open) at the time of compilation.
The unit consisting of array address and lengths is called an array descriptor.
The type to which a pointer is bound is called its base type.
A variable is no longer relevant when there are no references to it, references emanating from declared pointer variables.
The range of visibility of an identifier in the text is called scope, and it extends over the block in which the identifier is declared.
Software is not "written", but grows.
Once agreement is reached about the partitioning of a system into modules and about their interfaces, the team members can proceed independently in implementing that module assigned to them.
The successful development of complex systems crucially depends on the concept of modules and their separate compilation.
A primary goal of good code optimization is the most effective use of registers in order to reduce the number of accesses to the relatively slow main memory. A good strategy of register usage yields more advantages than any other branch of optimization.
In many compilers, local variables are allocated to registers only in procedures which do not contain any calls themselves (leaf procedures), and which therefore are also called most frequently, as they constitute the leaves in the tree representing the procedure cal hierarchy.

Justin Spencer

Pages

20170815

Compiler Construction by Niklaus Wirth

No comments:

Post a Comment