Justin Spencer: January 2018

20180131

Game Progrmming Patterns

Game Programming Patterns

Every program has some organization, even if it's just "jam the whole thing into main() and see what happens".
For me, good design means that when I make a change, it's as if the entire program was crafted in anticipation of it.
The measure of a design is how easily it accommodates changes.
Once you understand the problem and the parts of the code it touches, the actual coding is sometimes trivial.
I won't get on a soapbox here, but I'll ask you to consider doing more automated testing if you aren't already.
Loading code into neurons is so painfully slow that it pays to find strategies to reduce the volume of it.
To me, this is a key goal of software architecture: minimize the amount of knowledge you need to have in-cranium before you can make progress.
A lot of software architecture is about making your program more flexible. It's about making it take less effort to change it. That means encoding fewer assumptions in the program.
But performance is all about assumptions. The practice of optimization thrives on concrete limitations.
The faster you can try out ideas and see how they feel, the more you can try and the more likely you are to find something great.
Making your program more flexible so you can prototype faster will have some performance cost.
Prototyping--slapping together code that's just barely functional enough to answer a design question--is a perfectly legitimate programming practice. There is a very large caveat, though. If you write throwaway code, you must ensure you're able to throw it away.
One trick to ensuring your prototype code isn't obliged to become real code is to write it in a language different from the one your game uses. That way, you have to rewrite it before it can end up in your actual game.
The implementation that's quickest to write is rarely the quickest to run.
Abstraction and decoupling make evolving your program faster and easier, but don't' wast time doing them unless you're confident the code in question needs that flexibility.
Think about and design for performance throughout your development cycle, but put off the low-level, nitty-gritty optimizations that lock assumptions into your code until as late as possible.
Move quickly to explore your game's design space, but don't go so fast that you leave a mess behind you. You'll have to live with it, after all.
If you are going to ditch code, don't wast time making it pretty. Rock stars trash hotel rooms because they know they're going to check out the next day.
But, most of all, if you want to make something fun, have fun making it.
I think some patterns are overused, while others are underappreciated.
"Commands" are an object-oriented replacement for callbacks.
If a command object can do things, it's a small step for it to be able to undo them. Without the Command patter, implementing undo is surprisingly hard. With it, it's a piece of cake.
Friends don't let friends create singletons.
Flyweight, like its name implies, comes into play when you have objects that need to be more lightweight, generally because you have too many of them.
With instanced rendering, it's not so much that they take up too much memory as it is they take too much time to push each separate tree over the bus to the GPU, but the basic idea is the same.
As always, the golden rule of optimization is profile first. Modern computer hardware is too complex for performance to be a game of pure reason anymore.
You can't throw a rock at a computer without hitting an application built using the Model-View-Controller architecture, and underlying that is the Observer pattern.
It [the observer pattern] lets one piece of code announce that something interesting happened without actually caring who receives the notification.
In architecture, we're most often trying to make systems better, not perfect.
Sending a notification is simply walking a list and calling some virtual methods. Granted, it's a bit slower than a statically dispatched call, but that cost is negligible in all but the most performance-critical code.
When we get fuzzy about terminology, we lose the ability to communicate clearly and succinctly.
If you're responding to an event synchronously, you need to finish and return control as quickly as possible so that the UI doesn't lock up. When you have slow work to do, push it onto another thread or a work queue.
Dynamic allocation takes time, as does reclaiming memory, even if it happens automatically.
The reason design patterns get a bad rap is because people apply good patterns to the wrong problem and end up making things worse.
People--even those of us who've spent enough time in the company of machines to have some of their precise nature rub off on us--are reliably terrible at being reliable. That's why we invented computers: they don't make the mistakes we so often do.
They key idea [of the prototype pattern] is that an object can spawn other objects similar to itself.
Many people think "object-oriented programming" is synonymous with "classes".
OOP lets you define "objects" which bundle data and code together.
Compared to structured languages like C and functional languages like Scheme, the defining characteristic of OOP is that it tightly binds state and behavior together.
Early games procedurally generated almost everything so they could fit on floppies and old game cartridges. In many games today, the code is just an "engine" that drives the game, which is defined entirely in data.
Despite noble intentions, the Singleton pattern described by the Gang of Four usually does more harm than good.
There are times when a class cannot perform correctly if there is more than one instance of it. The common case is when the class interacts with an external system that maintains its own global state.
Saving memory and CPU cycles is always good.
As games got bigger and more complex, architecture and maintainability started to become the bottleneck. We struggled to ship games not because of hardware limitations, but because of productivity limitations.
Computer scientists call functions that don't access or modify global state "pure" functions. Pure functions are easier to reason about, easier for the compiler to optimize, and let you do neat things like memoization where you cache and reuse the results from previous calls to the function.
An assertion function is a way of embedding a contract into your code.
Assertions help us track down bugs as soon as the game does something unexpected, not later when that error finally manifests as something visibly wrong to the user. They are fences in your code-base, corralling bugs so that they can't escape from the code that created them.
The general rule is that we want variables to be as narrowly scoped as possible while still getting the job done. The smaller the scope an object has, the fewer places we need to keep in our head while we're working with it.
The simplest solution, and often the best, is to simply pass the object you need as an argument to the functions that need it.
The goal of removing all global state is admirable, but rarely practical.
Many of the techniques compilers now use for parsing programming languages were invented fro parsing human languages.
The coders you idolize who always seem to create flawless code aren't simply superhuman programmers. Instead, they have an intuition about which kinds of code are error-prone, and they steer away from them.
State machines help you untangle hairy code by enforcing a very constrained structure on it.
"Turing complete" means a system (usually a programming language) is powerful enough to implement a Turing machine in it, which means all Turing complete languages are, in some ways, equally expressive.
Double Buffer: Cause a series of sequential operations to appear instantaneous or simultaneous.
In their hearts, computers are sequential beasts. Their power comes from being able to break down the largest tasks into tiny steps that can be performed one after another. Often, though, our users need to see things occur in a single instantaneous step or see multiple tasks performed simultaneously.
A frame-buffer is an array of pixels in memory, a chunk of RAM where each couple of bytes represents the color of a single pixel.
This is why we need this pattern. Our program renders the pixels one at a time, but we need the display driver to see them all at once. Double buffering solves this.
The core problem that double buffering solves is state being accessed while it's being modified.
A game loop runs continuously during game-play. Each turn of the loop, it processes user input without blocking, updates the game states, and renders the game. It tracks the passage of time to control the rate of game-play.
Computers are naturally deterministic; they follow programs mechanically. Non-determinism appears when the messy real world creeps in.
Most games use floating point numbers, and those are subject to rounding error. Each time you add two floating point numbers, the answer you get back can be a bit off.
Each entity in the game should encapsulate its own behavior.
The game world maintains a collection of objects. Each object implements an update method that simulates one frame of the object's behavior. Each frame, the game updates every object in the collection.
Favor 'object composition' over 'class inheritance'.
Being a proficient programmer takes years of dedicated training, after which you must contend with the sheer scale of your code-base.
To make a system that users enjoy, you have to embrace their humanity, including their fallibility. Making mistakes is what people do, and is a fundamental part of the creative process. Handling them gracefully with features like undo helps your users be more creative and create better work.
When you find yourself with a lot of sub-classes, that often means a data-driven approach is better.
Under the hood, C++ virtual methods are implemented using something called a "virtual function table", or just "vtable". A vtable is a simple struct containing a set of function pointers, one for each virtual method in a class. There is one vtable in memory for each class. Each instance of a class has a pointer to the vtable for its class.
Scripting languages and other higher-level ways of defining game behavior can give us a much needed productivity boost, at the expense of less optimal run-time performance. Since hardware keeps getting better but our brainpower doesn't, that trade-off starts to make more and more sense.
Simplest is often best.
Once you get the hang of a programming language, writing code to do what you want is actually pretty easy. What's hard is writing code that's easy to adapt when your requirements change.
A powerful tool we have for making change easier is decoupling.
Humans are mainly visual animals, but hearing is deeply connected to our emotions and our sense of physical space.
A queue stores a series of notifications or requests in first-in, first-out order.
Complexity slows you down, so treat simplicity as a precious resource.
When you have a piece of state that any part of the program can poke at, all sorts of subtle inter-dependencies creep in.
In practice, the best way to store a bunch of homogeneous things is almost always a plain old array.
There are a bunch of ways to implement queues, but my favorite is called a ring buffer.
Optimizing for performance is a deep art that touches all aspects of software.
Accelerate memory access by arranging data to take advantage of CPU caching.
Sure, we can process data faster than ever, but we can't get that data faster.
RAM hasn't been keeping up with increasing CPU speeds.
Whenever the chip reads some memory, it gets a whole cache line. The more you can use stuff in that cache line, the faster you go. So the goal then is to organize your data structures so that the things you're processing are next to each other in memory.
One of the hallmarks of software architecture is abstraction.
Avoid unnecessary work by deferring it until the result is needed.
A "dirty" flag tracks when the derived data is out of sync with the primary data. It is set when the primary data changes.
Improve performance and memory use by reusing objects from a fixed pool instead of allocating and freeing them individually.
Fragmentation means the free space in our heap is broken into smaller pieces of memory instead of one large open block. The total memory available may be large, but the largest contiguous region might be painfully small.
Much of software engineering is fighting against complexity.

20180129

Effective Python by Brett Slatkin

There are two major versions of Python still in active use: Python 2 and Python 3.
There are multiple popular run times for Python: CPython, Jython, IronPython, PyPy, etc.
Be sure that the command-line for running Python on your system is the version you expect it to be.
Prefer Python 3 for your next project because that is the primary focus of the Python community.
Always follow the PEP 8 style guide when writing Python code.
Sharing a common style with the larger Python community facilitates collaboration with others.
Using a consistent style makes it easier to modify your own code later.
In Python 3, bytes contains sequences of 8-bit values, str contains sequences of Unicode characters. bytes and str instances can't be used together with operators (like > or +).
In Python 2, str contains sequences of 8-bit value,s Unicode contains sequences of Unicode characters. str and Unicode can be used together with operators if the str only contains 7-bit ASCII characters.
Use helper functions to ensure that the inputs you operate on are the type of character sequence you expect (8-bit values, UTF-8 encoded characters, Unicode characters, etc.).
If you want to read or write data to/from a file, always open the file using a binary mode (like 'rb' or 'wb').
Python's syntax makes it all too easy to write single-line expression that are overly complicated and difficult to read.
Move complex expressions into helper functions, especially if you need to use the same logic repeatedly.
The if/else expression provides a more readable alternative to using Boolean operators like or and and in expressions.
Avoid being verbose: Don't supply 0 for the start index or the length of the sequence for the end index.
Slicing is forgiving of start or end indexes that are out of bounds, making it easy to express slices on the front or back boundaries of a sequence (like a[:20] or a[-20:]).
Assigning to a list slice will replace that range in the original sequence with what references even if their lengths are different.
Specifying start, end, and stride in a slice can be extremely confusing.
Prefer suing positive stride value in sluices without start or end indexes. Avoid negative stride values if possible.
Avoid using start, end, and stride together in a single slice. If you need all three parameters, consider doing two assignments (one to slice, another to stride) or using islice from the itertools built-in module.
List comprehensions are clearer than the map and filter built-in functions because they don't require extra lambda expressions.
List comprehensions allow you to easily skip items from the input list, a behavior map doesn't support without help from filter.
Dictionaries and sets also support comprehension expressions.
List comprehensions support multiple levels of loops and multiple conditions per loop level.
List comprehensions with more than two expressions are very difficult to read and should be avoided.
List comprehensions can cause problems for large inputs by using too much memory.
Generator expressions avoid memory issues by producing outputs one at a time as an iterator.
Generator expressions can be composed by passing the iterator from one generator expression into the for sub-expression of another.
Generator expression execute very quickly when chained together.
enumerate provides concise syntax for looping over an iterator and getting the index of each item from the iterator as you go.
Prefer enumerate instead of looping over a range and indexing into a sequence.
You can supply a second parameter to enumerate to specify the number from which to begin counting (zero is the default).
The zip built-in function can be used to iterate over multiple iterators in parallel.
Tn Python3, zip is a lazy generator that produces tuples. In python 2, zip returns the full result as a list of tuples.
zip truncates its output silently if you supply it with iterators of different lengths.
The zip_longest function from the itertools built-in module lets you iterate over multiple iterators in parallel regardless of their lengths.
Python has special syntax that allows else blocks to immediately follow fro and while loop interior blocks.
The else block after a loop only runs if the loop body did not encounter a break statement.
Avoid using else blocks after loops because their behavior isn't intuitive and can be confusing.
The try/finally compound statement lets you run cleanup code regardless of whether exceptions were raised in the try block.
The else block helps you minimize the amount of code in try blocks and visually distinguish the success case from the try/except blocks.
An else block can be used to perform additional actions after a successful try block but before common cleanup in a finally block.
Functions that return None to indicate special meaning are error prone because None and other values (e.g. zero, the empty string) all evaluate to False in conditional expressions.
Raise exceptions to indicate special situations instead of returning None. Expect the calling code to handle exceptions properly when they're documented.
Closure functions can refer to variables from any of the scopes in which they were defined.
By default, closures can't affect enclosing scopes by assigning variables.
In Python 3, use the non-local statements to indicate when a closure can modify a variable in its enclosing scopes.
In Python 2, use a mutable value (like a single-item list) to work around the lack of the non-local statement.
Avoid using non-local statements for anything beyond simple functions.
Using generators can be clearer than the alternative for returning lists of accumulated results.
The iterator returned by a generator produces the set of values passed to yield expressions within the generator function's body.
Generators can produce a sequence of outputs for arbitrarily large inputs because their working memory doesn't include all inputs and outputs.
Beware of functions that iterate over input arguments multiple times. If these arguments are iterators, you may see strange behavior and missing values.
Python's iterator protocol defines how containers and iterators interact with the iter and next built-in functions, for loops, and related expressions.
You can easily define your own iterable container type by implementing the __iter__ method as a generator.
You can detect that a value is an iterator (instead of a container) if calling iter on it twice produces the same result, which can then be progressed with the next built-in function.
Functions can accept a variable number of positional arguments by using *args in the def statement.
You can use the items from a sequence as the positional arguments for a function with the * operator.
Using the * operator with a generator may cause your program to run out of memory and crash.
Adding new positional parameters to functions that accept *args can introduce hard-to-find bugs.
Function arguments can be specified by position or by keyword.
Keywords make it clear what the purpose of each argument is when it would be confusing with only positional arguments.
Keyword arguments with default values make it easy to add new behaviors to a function, especially when the function has existing callers.
Optional keyword arguments should always be passed by the keyword instead of by position.
Default arguments are only evaluated once: during function definition at module load time. This can cause odd behaviors for dynamic values (like {} or []).
Use None as the default value for keyword arguments that have a dynamic value. Document the actual default behavior in the function's doc-string.
Keyword arguments make the intention of a function call more clear.
Use keyword-only arguments to force callers to supply keyword arguments for potentially confusing functions, especially those that accept multiple Boolean flags.
Python 3 supports explicit syntax for keyword-only arguments in functions.
Python 2 can emulate keyword-only arguments for functions by using **kwargs and manually raising TypeError exceptions.
void making dictionaries with values that are other dictionaries or long tuples.
Use namedtuple for lightweight, immutable data containers before you need the flexibility of a full class.
Move your bookkeeping code to use multiple helper classes when your internal state dictionaries get complicated.
Instead of defining and instantiating classes, functions are are often all you need for simple interfaces between components in Python.
References to functions and methods in Python are first class, meaning they can be used in expressions like any other type.
The __call__ special method enables instances of a class to be called like plain Python functions.
When you need a function to maintain state, consider defining a class that provides the __call__ method instead of defining a stateful closure.
Python only supports a single constructor per class, the __init__ method.
Use @classmethod to define alternative constructors for your classes.
Use class method polymorphism to provide generic ways to build and connect concrete sub-classes.
Python's standard method resolution order (MRO) solves the problems of super-class initialization order and diamond inheritance.
Always use the super built-in function to initialize parent classes.
Avoid using multiple inheritance if mix-in classes can achieve the same outcome.
Use pluggable behaviors at the instance level to provide per-class customization when mix-in classes may require it.
Compose mix-ins to create complex functionality from simple behaviors.
Private attributes aren't rigorously enforced by the Python compiler.
Plan from the beginning to allow sub-classes to do more with your internal APIs and attributes instead of locking them out by default.
Use documentation of protected fields to guide sub-classes instead of trying to force access control with private attributes.
Only consider using private attributes to avoid naming conflicts with sub-classes that are out of your control.
Inherit directly from Python's container types (like list or dict) for simple use cases.
Beware of the large number of methods required to implement custom container types correctly.
Have your custom container types inherit from the interfaces defined in collections.abc to ensure that your classes match required interfaces and behaviors.
Define new class interfaces using simple public attributes, and avoid set and get methods.
Use @property to define special behavior when attributes are accessed on your objects, if necessary.
Follow the rule of least surprise and avoid weird side effects in your @property methods.
Ensure that @property methods are fast; do slow or complex work using normal methods.
Use @property to give existing instance attributes new functionality.
Make incremental progress toward better data models by using @property.
Consider refactoring a class and all call sites when you find yourself using @property too heavily.
Reuse the behavior and validation of @property methods by defining your own descriptor classes.
Use WeakKeyDictionary to ensure that your descriptor classes don't cause memory leaks.
Don't get bogged won trying to understand exactly how __getattribute__ uses the descriptor protocol for getting and setting attributes.
Use __getattr__ and __setattr__ to lazily load and save attributes for an object.
Understand that __getattr__ only gets called once when accessing a missing attribute, whereas __getatrribute__ gets called every time an attribute is accessed.
Avoid infinite recursion in __getattribute__ and __setattr__ by using methods from super() (i.e. object class) to access instance attributes directly.
Use meta-classes to ensure that sub-classes are well formed at the time they are defined, before objects of their type or constructed.
Meta-classes have slightly different syntax in Python 2 vs Python 3.
The __new__ method of meta-classes is run after the class statement's entire body has been processed.
Class registration is a helpful pattern for building modular Python programs.
Meta-classes let you run registration code automatically each time your base class is sub-classes in a program.
Using meta-classes for class registration avoids errors by ensuring that you never miss a registration call.
Meta-classes enable you to modify a class's attributes before the class is fully defined.
Descriptors and meta-classes make a powerful combination for declarative behavior and run-time introspection.
You can avoid both memory leaks and the weakref module by using meta-classes along with descriptors.
Use the sub-process module to run child processes and manage their input and output streams.
Child processes run in parallel with the Python interpreter, enabling you to maximize your CPU usage.
Use the timeout parameters with communicate to avoid deadlocks and hanging child processes.
Python threads can't run byte-code in parallel on multiple CPU cores because of the global interpreter lock (GIL).
Python threads are still useful despite the GIL because they provide an easy way to do multiple things at seemingly the same time.
Use Python threads to make multiple system calls in parallel. This allows you to do blocking I/O at the same time as computation.
Even though Python has a global interpreter lock, you're still responsible for protecting against data races between the threads in your programs.
Your programs will corrupt their data structures if you allow multiple threads to modify the same objects without locks.
The Lock class in the threading built-in module is Python's standard mutual exclusion lock implementation.
Pipelines are a great way to organize sequences of work that run concurrently using multiple Python threads.
Be aware of the many problems in building concurrent pipelines: busy waiting, stopping workers, and memory explosion.
The Queue class has all of the facilities you need to build robust pipelines: blocking operations, buffer sizes, and joining.
Co-routines provide an efficient way to run tens of thousands of functions seemingly at the same time.
Within a generator, the value of the yield expression will be whatever value was passed to the generator's send method from the exterior code.
Co-routines give you a powerful tool for separating the core logic of your program from its interaction with the surrounding environment.
Python 2 doesn't support yield from or returning values from generators.
Moving CPU bottlenecks to C-extension modules can be an effective way to improve performance while maximizing your investment in Python code. However, the cost of doing so is high and may introduce bugs.
The multiprocessing module provides powerful tools that can parallelize certain types of Python computations with minimal effort.
The power of multiprocessing is best accessed through the concurrent.futures built-in module and its simple ProcessPoolExecutor class.
The advanced parts of the multiprocessing module should be avoided because they are so complex.
Decorators are Python syntax for allowing one function to modify another function at run-time.
Using decorators can cause strange behaviors in tools that do introspection, such as debuggers.
Use the wraps decorator from the functools built-in module when you define your own decorators to avoid any issues.
The with statement allows you to reuse logic from try/finally blocks and reduce visual noise.
The contextlib built-in module provides a contextmanager decorator that makes it easy to use your own functions in with statements.
The value yielded by context managers is supplied to the as part of the with statement. It's useful for letting your code directly access the cause of the special context.
The pickle built-in module is only useful for serializing and deserializing objects between trusted programs.
The pickle module may break down when used for more than trivial use cases.
Use the copyreg built-in module with pickle to add missing attribute values, allow versioning of classes, and provide stable import paths.
Avoid using the time module for translating between different time zones.
Use the datetime built-in module along with the pytz module to reliably convert between times in different time zones.
Always represent time in UTC and do conversions to local time as the final step before presentation.
Use Python's built-in modules for algorithms and data structures.
Don't re-implement this functionality yourself. It's hard to get right.
Python has built-in types and classes in modules that can represent practically every type of numerical value.
The Decimal class is ideal or situations that require high precision and exact rounding behavior, such as computations of monetary values.
The Python Package Index (PyPI) contains a wealth of common packages that are built and maintained by the Python community.
pip is the command-line tool to use for installing packages form PyPI.
pip is installed by default in Python 3.4 and above; you must install it yourself for older versions.
The majority of PyPI modules are free and open source software.
Write documentation for every module, class, and function using doc-strings. Keep them up to date as your code changes.
For modules: Introduce all contents of the module and any important classes or functions all users should know about.
For classes: Document behavior, important attributes, and subclass behavior in the doc-string following the class statement.
For functions and methods: Document every argument, returned value, raised exception, and other behaviors in the doc-string following the def statement.
Packages in Python are modules that contain other modules. Packages allow you to organize your code into separate, non-conflicting name-spaces with unique absolute module names.
Simple packages are defined by adding an __init__.py file to a directory that contains other source files. These files become the child modules of the directory's package. Package directories may also contain other packages.
You can provide an explicitly API for a module by listing its publicly visible names in its __all__ special attribute.
You can hide a package's internal implementation by only important public names in the package's __init__.py file or by naming internal-only members with a leading underscore.
When collaborating within a single team or on a single code-base, using __all__ for explicit APIs is probably unnecessary.
Defining root exceptions for your modules allows API consumers to insulate themselves form your API.
Catching root exceptions can help you find bugs in code that consumes an API.
Catching the Python Exception base class can help you find bugs in API implementations.
Intermediate root exceptions let you add more specific types of exceptions in the future without breaking your API consumers.
Circular dependencies happen when two modules must call into each other at import time. They can cause your program to crash at startup.
The best way to break a circular dependency is refactoring mutual dependencies into a separate module at the bottom of the dependency tree.
Dynamic imports are the simplest solution for breaking a circular dependency between modules while minimizing refactoring and complexity.
Virtual environments allow you to use pip to install many different versions of the same package on the same machine without conflicts.
Virtual environments are created with pyvenv, enabled with source bin/activate, and disabled with deactivate.
You can dump all of the requirements of an environment with pip freeze. You can reproduce the environment by supplying the requirements.txt file to pip install -r.
In versions of Python before 3.4, the pyvenv tool must be downloaded and installed separately. The command-line tool is called virtualenv instead of pyvenv.
Programs often need to run in multiple deployment environments that each have unique assumptions and configurations.
You can tailor a module's contents to different deployment environments by using normal Python statements in module scope.
Module contents can be the product of any external condition, including host introspection though the sys and os modules.
Calling print on built-in Python types will produce the human-readable string version of a value, which hides type information.
Calling repr on built-in Python types will produce the printable string version of a value. These repr strings could be passed to the eval built-in function to get back the original value.
%s in format strings will produce human-readable strings like str. %r will price printable stings like repr.
You can define the __repr__ method to customize the printable representation of a class and provide more detailed debugging information.
You can reach into any object's __dict__ attribute to view its internals.
The only way to have confidence in a Python program is to write tests.
The unittest built-in module provides most of the facilities you'll need to write good tests.
You can define tests by sub-classing TestCase and defining one method per behavior you'd like to test. Test methods on TestCase classes must start with the word test.
It's important to write both unit tests (for isolated functionality) and integration tests (for modules that interact).
You can initiate the Python interactive debugger at a point of interest directly in your program with the import pdb; pdb.set_trace() statements.
The Python debugger prompt is a full Python shell that lets you inspect and modify the state of a running program.
pdb shell commands let you precisely control program execution, allowing you to alternate between inspecting program state and progressing program execution.
It's important to profile Python programs before optimizing because the source of slowdowns is often obscure.
Use the cProfile module instead of the profile module because it provides more accurate profiling information.
The Profile object's runcall method provides everything you need to profile a tree of function calls in isolation.
The Stats object lets you select and print the subset of profiling information you need to see to understand your program's performance.
It can be difficult to understand how Python programs use and leak memory.
The gc module can help you understand which objects exist, but it has no information about how they were allocated.
The tracemalloc built-in module provides powerful tools for understanding the source of memory usage.
tracemalloc is only available in Python 3.4 and above.

20180128

Running Lean by Ash Maurya

Practice trumps theory.
Today's companies can build anything they can imagine. So the question we are called on to answer is no longer primarily, "can it be built?", but rather, "should it be built?"
Successful new products require constant, disciplined, experimentation--in the scientific sense--in order to discover new sources of profitable growth. This is true for the tiniest startup as well as for the most established company.
Most startups still fail. But the more interesting fact is that, of those startups that succeed, two-thirds report having drastically changed their plans along the way.
So, what separates successful startups from unsuccessful ones is not necessarily the fact that successful startups began with a better initial plan, but rather that they find a plan that works before running out of resources.
Running lean is a systematic process for iterating from Plan A to a plan that works, before running out of resources.
The key takeaway from Customer Development can best be summed up as: Get out of the building.
The key takeaway from Lean Startup can best be summed up around the concept of using smaller, faster iterations for testing a vision.
Startups that succeed are those that manage to iterate en ought times before running out of resources.
Startups are inherently chaotic, but at any given poi tn in time, there are only a few key actions that matter. You need to just focus on those and ignore the rest.
You get a gold star not for following a process, but for achieving results.
No methodology can guarantee success. But a good methodology can provide a feedback loop for continuous improvement and learning.
Principles guide what you do. Tc tics show you how.
The essence of running lean can be distilled into three steps:

1. Document your Plan A.
2. Identify the riskiest parts of your plan.
3. Systematically test your plan.

Reasonably smart people can rationalize anything, but entrepreneurs are especially gifted at this.
The first step is writing down your initial vision and then sharing it with at least one other person.
A single-page business modal is much easier to share with others, which means it will be read by more people and probably will be more frequently updated.
A key point I would like you to take away for now, though, is that your product is NOT "the product" of your startup.
Customers don't care bout your solution. They care about their problems.
Your Jon isn't just building the best solution, but owning the entire business model and making all the pieces fit.
Building a successful product is fundamentally about risk mitigation.
Startups are a risky business, and our real job as entrepreneurs is to systematically de-risk our startups over time.
The bigger risk for most startups is building something nobody wants.
A startup goes through three distant States:

1. problem/solution fit
2. product/market fit
3. scale

The first stage is about determining whether you have a problem worth solving before investing months or years of effort into building a solution.
While ideas are cheap, acting on them is quite expensive.
A problem with solving boils down to three questions:

1. Is it something customers want?
2. Will they pay for it? If not, who will?
3. Can it be solved?

Once you have a problem worth solving and your MVP has been built, you then test how well your solution solves the problem. In other words, you messier Whether you have built something people want.
After product/market fit, some level of success is almost always guaranteed. Your focus at this stage shifts toward growth, or scaling your business model.
Before product/market fit, the focus of the startup centers on learning and pivots. After product/market fit, the focus shifts towards growth and optimizations.
Pivot is a term used by Eric Rise to describe a change in direction of a startup while staying grounded in learning.
The best way to differentiate pivots from optimizations is that pivots are about finding a plan that works, while optimizations are about accelerating that plan.
You stand to learn the most when the probability of the expected outcome is 50%; that is, when you don't know what to expect.
In order to maximize learning, you have to pick bold outcomes instead of chain incremental improvements.
Your first goal should be to establish just ugh of a runaway to allow you to start testing and validating your business model with customers.
bootstrapping + lean startup = low-burn startup
The lean startup methodology is strongly rooted in the scientific method, and running experiments is a key activity.
A cycle around the validate learning loop is called an experiment.
A book, like large software, is never finished--only released.
Capture your business model in a portable, one-page diagram.
Hill climbing is good for finding a local optimum, but it is not guarantee ed to find the best possible solution out of all possible solutions.
A customer is someone who pays for your product, identify your customers.
You can't effectively built, design, and position a product for everyone.
Business plans try too hard to predict the future, which is impossible. Instead, write your canvas with a "getting things done" attitude.
Doing nothing could also be a viable alternative for a customer if the pain is not acute enough.
Unique Value Proposition: Why you are different and worth getting attention.
Be different, but make sure your difference matters.
The key to unlocking what's different about your product is deriving your UVP directly from the number-one problem you are solving. If that problem is indeed worth solving, you're more than halfway there already.
Pick your words carefully and own them.
Words are key to any great marketing and branding campaign.
Picking a few "key" words that you consistently use also drives your search engine optimization (SE) ranking.
Bind a solution to your problem as late as possible.
Failing to build a significant path to customers is among the top reasons why startups fail.
The initial goal of a startup is to learn, not to scale. So, at first it's OK to rely on any channels that get you in front of potential customers.
When you don't yet have a tested value proposition, it's hard to justify spending marketing dollars or effort on outbound messaging.
First sell manually, then automate.
You have to first sell your product yourself, before letting others do it.
While referral programs can be very effective in spreading the word about your product, you need to have a product worth spreading first.
Your MVP should address not only the top problems customers have unidentified as being important to them, but also the problems that are worth solving. By that definition, you should plan to deliver enough value to justify charging.
I believe that if you intent to charge for your product, you should charge from day one.
Price is part of the product.
Every business has a few key numbers that can be used to measure how well it is performing. These numbers are key for both measuring progress and identifying hot spots in your customer life cycle.
Retention measures "repeated use" and/or engagement with your product.
An interesting perspective to keep in mind is that anything worth copying will be copied, especially once you start to demonstrate a viable business model.
Incorrect prioritization of risk is one of the top contributors of waste.
Uncertainty: The lack of complete certainty, that is, the existence of more than one possibility.
Risk: A state of uncertainty where some of the possibilities involve a loss, catastrophe, or other undesirable outcome.
The way you quantify risk in your business model is by quantifying the probabilities of a specific outcome along with quantifying the associated loss if you're wrong.
Your objective is to find a model with a big enough market you can reach with customers who need your product that you can build a business around.
The ideal problem/solution team size is two or three people.
There are many arguments for building your release 1.0 with a small team:

1. Communication is easier.
2. You build less.
3. You keep costs low.

While it is possible to build a product by yourself, I highly recommend working with at least on other person who can, at a minimum, help to enforce periodic reality checks.
More important than the number of members is ensuring that you have the right talents withing the team to iterate quickly.
If you are building a product, you need strong product development skills on your team. Having prior experience building stuff is key, along with expertise in the specific technology you are using.
The one thing you should never outsource is learning about customers.
Challenge yourself to find the simplest thing you can do to test a hypothesis. This is an underappreciated skill. Once you truly understand what's riskiest about your product, it's often possible to build something other than the product to test it.
A falsifiable hypothesis is a statement that can be clearly proven wrong.
Falsifiable Hypothesis = [Specific Repeatable Action] will [Expected Measure able Outcome]
Company-wide dashboards are great for on-the-ground tactical analysis, but it is equally important to rep rt on your learning milestones at a strategic level.
Maximize learning (about what's riskiest) per unit time.
Product risk: Getting the product right

1. First make sure you have a problem worth solving.
2. Then define the smallest possible solution (MVP).
3. Build and validate your MVP at small scale (demonstrate UVP).
4. Then verify it at large scale.

Customer risk: Building a path to customers

1. First identify who has the pain.
2. Then narrow this down to early adopters who really want your product now.
3. It's OK to start with outbound channels.
4. But gradually build/develop salable inbound channels--the earlier the better.

Market risk: Building a viable business

1. Identify competition through existing alternatives and pick a price for your solution.
2. Test pricing first by measuring what customers SSA (verbal commitments).
3. Then test pricing by what customers do.
4. Optimize your cost structure to make the business model work.

The fastest way to learn is to talk to customers. Not releasing code, or collecting analytic, but talking to people.
Surveys assume you know the right questions to ask.
Customer interviews are about exploring what you don't know you don't know.
The best initial learning comes from "open-ended" questions.
The problem with focus groups is that they quickly developer to "group think", which is wrong for most products.
While surveys are bad at supporting initial learning, they can be quite effective at verifying what you learn from customer interviews.
The customer development battle cry, "Get out of the building", codified by Steve Blank, is simultaneously one of the most basic and difficult practices to implement.
Don't ask customers what they want. Measure what they do.
Life is too short to keep building something nobody (or not enough people) want.
Scratching your own itch is a great way to get started, but you still need to validate that you have a problem worth solving by talking to other people.
When you are able to nail the customer's problem and help him visualize a viable solution, he will buy from you, provided that you remove other objections--for example, by providing a trial period, making it easy to cancel, and so on.
Understand your customer's worldview before formulating a solution.
Understanding your early adopters' existing alternatives is key to formulating the right product. Early adopters will use their existing alternatives as anchors against which they Will judge your solution, pricing, and positioning.
The best way to uncover the "key" words to use in your MVP is to listen closely to how customers describe their workflow.
Most customers are great at articulating problems but not at visual ling solutions.
The more real your demo looks, the more accurately you'll be able to test your solution.
You can't (and shouldn't) convince a customer that she has a must-have problem, but you Ogden can (and should) convince a customer to pay a "fair" price for your product that is usually higher than what both you and the customer think it is.
The most effective way to get noticed is to nail a customer problem.
Usually the right price is one the customer accepts, but with a little resistance.
Don't ask the customer for ballpark pricing. Instead, tell him your pricing model and gauge his response immediately afterward.
Reducing the scope of your MVP not only shortens your development cycle, but also removes unnecessary distractions that dilute your product's messaging.
Your MVP should be like a great reduction sauce--concentrated, intense, and flavorful.
Don't automatically assume that any features have to be included in your MVP. Start with a clean slate and justify the addition of each one.
Start with your number-one problem.
The job of your unique value proposition (UVP) is to make a compelling promise.
The job of the MVP is to deliver on that promise.
All your energy needs to be channeled ed toward accelerating learning. Speed is key.
Chances are quite high that you will not have a scaling problem when you launch.
Continuous deployment is a practice of releasing software continuously throughout the day--in minutes versus days, weeks, or months.
Continuous flow has been shown to boost productivity by rearranging manufacturing processes so that products are built end-to-end, one at a time, versus the more prevalent batch-and-queue approach.
The biggest waste in manufacturing is created from having to transport products from one place to another. The biggest waste in software is created from wain ting for software as it moves from one state to another: waiting to code, waiting to test, waiting to deploy. Reducing or eliminating these wait times leads to faster iterations, which is the key to success.
Your activation flow describes the path customers take from signing up for your service to having a gratifying first experience.
Reduce sign up friction, but not at the expense of learning.
It is generally a good practice to keep your sign up forms short and only collect what you absolutely need, but don't shy away from asking for critical contact information up front.
The purpose of your marketing website is smile: to sell your product.
Your marketing website is critical in driving the acquisition trigger in your customer life cycle.
Acquisition describes the path a customer takes from first landing on your website as an unaware visitor to becoming an interested prospect.
The landing page is by far the hardest of the three. Its job is to make a case for your product to an unaware visitor in fewer than eight seconds.
Every page needs to have a single, clear call to action. It should stand out and set a clear expectation as to what happens next.
An actionable metric is one that ties specific and repeatable actions to observed results.
The opposite of actionable metrics are vanity metrics, which only serve to document the current state of the product but offer no insight (by themselves) into how you got there or what to do next.
Metrics can help you identify where things are going wrong, but they can't tell you why. You need to talk to people for that.
Before selling your minimum viable product (MVP) to strangers through your distribution channel, sell it face to face to friendly early adopters. Learn from them. Then refine your design, positioning, and pricing for launch.
If you can't convert a warm prospect in a 20-minute face-to-face interview, it will be much harder to convert a visitor in less than eight seconds on your landing page.
The fastest way to learn from customers is to talk to them.
Contrary to popular belief, you won't be bombarded with phone calls.
Email is a very effective (and often underutilized) medium for engaging your customers. Everyone has an email address. Email can be automated, tracked, and measured.
You stand to learn as much (if not more) from your lost sales as you do from your sales.
Your goal is to establish "just enough" traffic to support learning.
Simple products are simple to understand.
Building great software is hard.
First troubleshoot and resolve issues with ousting features before chasing new features.
Put down the compiler until you learn why they're not buying.
Features always have hidden costs.
More features mean more tests, more screenshots, more videos, more coordination, more complexity, and more distractions.
A good rule of thumb for prioritizing focus is to implement an 80/20 rule.
Most of your time immediately after launch should be spent measuring and improving existing features versus chasing after shiny new features.
A key principle of Kan ban that works to constrain the work queue involves setting limits on the number of features that can be in progress at any given time. This allows you to maximize throughput while minimizing waste.
In a lean startup, a feature is only "done" when it provides validated learning from customers.
Because you have a finite work-in-progress limit, you need to carefully prioritize your backlog queue against your product's immediate goals.
Once you have identified a feature, the first step is to test to see if the problem is worth solving. If you can't justify building the feature, kill it immediately.
Surveys are more effective at verification than learning.
Achieving product/market fit or traction can fundamentally be reduced to building something people want or, in other words, delivering on your UVP.
You have early traction when you are retaining 40% of your activated users, month after month.
While revenue is the first form of validation, retention is the ultimate form of validation.
Focusing on scaling your business before you can demonstrate early traction is a form of waste.
Once you can demonstrate early traction, your focus should shift toward achieving sustainable growth.
Churn rate is the fraction of customers who leave or fail to reaming engaged with a product after a given time period.
Viral coefficient measures the number of converted referrals per customer.
What's stopping your business from growing 10x?
Every product has to start by demonstrating and delivering a basic value proposition to customers.
Study your baseline customer life cycle to identify any particular usage patterns.
Once you've selected your key meaning of growth, put a stake in the ground: Declare the key metric and improvement you want to achieve. Then, align your next set of experiments toward that goal.
Once your value metrics are validate at a micro scale, you need to race toward your critical network tipping point using the viral engine of growth. Once there, you can look to validate your attention currency through means such as advertising, premium memberships, or something else.
Matching buyers and sellers s a hard problem.
Getting to product/market fit is the first significant milestone of a startup. At this stage, some level of success is almost guaranteed and your focus can now shift from learning to scaling.
Every process works well until you add people.
The key is to build a continuous learning culture of experimenters versus specialist is, where it's everyone's job to be accountable toward creating and capturing customer value.
At every stage of the startup, there are a set of actions that are "right" for the startup, in that they maximize return on time, money, and effort. A lean/bootstrapped entrepreneur ignores all else.
Seed stage investors are just as bad at guessing what products will sauced AA's you are. Without any product validation to rely on, they hedge their bets against your team's track record and storytelling ability.
Time is more valuable than money.
Constraints drive innovation, but more important, they force action.
With less money, you are forced to build less, get it out faster, and learn faster.
Until you find a problem worth solving, it really don't make sense to quit your day job.
The biggest burn in a software business is people. Hardware is cheap. Rent, don't buy. Don't scale until you have a scaling problem. Don't hire until it hurts.
Make a goal of first covering your hardware/hosting costs, and then your people costs.
It is very tempting to take on unrelated consulting to survive, but it becomes very hard (if not outright impossible) to build a great product in parallel. Instead, look for other related stuff that you can sell along the way.
IN a lean startup, eliminating waste is a fundamental principle.
Waste is any human activity which absorbs resources but creates no value.
Of all resources, there is no resource more valuable than time. Time is more valuable than money. While money can fluctuate up or down, time only moves in one direction.
Managers typically organize their day into one-hour blocks, and spend each hour dealing with a different task. Maker, like programmers and writers, need to organize their day into longer blocks of uninterrupted time. The cost of context switching is low (and expect) in a manager's schedule. It is high (and a productivity killer) in a maker's schedule.
Establish uninterruptible time blocks for maker work.
Achieve maker goals as early in the day as possible.
Scheduled manage activities as late in the day as possible.
Always be ready for unplanned activities like customer support.
Iterate around only three to five actionable metrics.
A few actionable metrics are all you need to identify and prioritize the most critical issues to tackle.
Building software to specifications is hard enough that, when faced with a startup environment where both problems and solutions are largely unknown, it is optimal to iterate around less code and more learning.
Avoid overproduction by making customer pull for features.
Eighty percent of your effort should be spent toward optimizing existing features versus building new ones.
Time-based trials help time-box your pricing experiments so that you can force a conversion decision, which allows you to learn and iterate faster.
Many services make the mistake of giving away too ouch under their free plans, which leads to very low or no conversions. One reason for this is that creatives are especially know to undervalue their own work and are really bad at setting pricing.
Pricing should be set with the buyer in mind, not the seller.
Your free user are not your customers (yet).
Free users aren't "free".
Even though the operational costs of carrying a free user may seem low, they aren't zero.
Since your eventual goal is to charge for your product anyway, why not start there? Pick features and a plan based on what customers will pay for today and sign them on as your first customers. Not only is this simpler to build, but it's also simpler to measure.
The number-one way to get a prospect to agree to an interview is to "nail his problem".
The purpose of each sentence should be to get the next one read.
Code in smaller batches.
The basic idea here is to deploy less code but more frequently. The definition of a small batch is relative, but strive to make it as small as possible.
Another practice for reducing work-in-progress inventory is to not use any branching in your source control tree.
The longer you stay of the trunk, the more integration debt you collect, which again inevitably leads to more integration risk, coordination, and planning headaches.
Testing is everyone's responsibility.
Outsource as much of your server infrastructure as possible.
Spending effort setting up and configuring your own servers at this stage is a form of waste. You should instead pick a cloud or platform provider and focus all your efforts on building your application, not your infrastructure.
Deploy manually first, then automate.
A feature flipper system uses flags in your code that allow you to enable/disable features on a per-user basis.
The Pareto Principle: Roughly 80% of the efforts come from 20% of the causes.
The continuous deployment cycle has a built-in feedback loop that helps you build this monitoring incrementally.
The Five Whys is a questions-asking method used to deplore the cause/effect relationships underlying a particular problem. Ultimately, the goal of applying the Five Whys method is to determine a root cause of a defect or problem.
A key design principle is to decouple data collection from data visualization.
Log everything.
A good practice to complement tracking raw events is to log every "potentially inter sting" property along with each event.
Every product has a core set of user actions that track ongoing representative usage.

20180127

xv6 by Russ Cox, Frans Kaashoek, and Robert Morris

xv6: a simple, unix-like teaching operating system

The job of an operating system is to share a computer among multiple programs and to provide a more useful set of services than the hardware alone supports.
An operating system provides services to user programs through an interface. Designing a good interface turns out to be difficult.
xv6 takes the traditional form of a kernel, a special program that provides services to running programs.
Each running program, called a process, has memory containing instructions, data, and a stack. The instructions implement the program's computation. The data are the variables on which the computation acts. The stack organizes the program's procedure calls.
When a process needs to invoke a kernel service, it invokes a procedure call in the operating system interface. Such a procedure call is called a system call. The system call enters the kernel; the kernel performs the service and returns. Thus a process alternates between executing in user space and kernel space.
The kernel uses the CPU's hardware protection mechanisms to ensure that each process executing in user space can access only its own memory.
When a user program invokes a system call, the hardware raises the privilege level and starts executing a pre-arranged function in the kernel.
The shell is an ordinary program that reads commands from the user and executes them.
An xv6 process consists of user-space memory (instructions, data, and stack) and per-process state private to the kernel.
When a process is not executing, xv6 saves its CPU registers, restoring them when it next runs the process.
A process may create a new process using the fork system call. Fork creates a new process, called the child process, with exactly the same memory contents as the calling process, called the parent process.
The exit system call causes the calling process to stop executing and to release resources such as memory and open files.
Although the child has the same memory contents as the parent initially, the parent and child are executing with different memory and different registers: changing a variable in one does not affect the other.
The exec system call replaces the calling process's memory with a new memory image loaded from a file stored in the file system.
xv6 does not provide a notion of users or of protecting one user from another; in Unix terms, all xv6 processes run as root.
A file descriptor is a small integer representing a kernel-managed object that a process may read from or write to.
Internally, the xv6 kernel uses the file descriptor as an index into a per-process table, so that every process has a private space of file descriptors starting at zero.
The read and write system calls read bytes from and write bytes to open files named by file descriptors.
The dup system call duplicates an existing file descriptor, returning a new one that refers to the same underlying I/O object.
File descriptors are a powerful abstraction, because they hid the details of what they are connected to: a process writing to file descriptor 1 may be writing to a file, to a device like the console, or to a pipe.
A pipe is a small kernel buffer exposed to processes as a pair of file descriptors, one for reading and one for writing. Writing data to one end of the pipe makes that data available for reading from the other end of the pipe. Pipes provide a way for processes to communicate.
Pipes have at least four advantages over temporary files:

Pipes automatically clean themselves up.
Pipes can pass arbitrarily long streams of data.
Pips allow for parallel executing of pipeline stages.
If you are implementing inter-process communication, pipes' blocking reads and writes are more efficient than the non-blocking semantics of files.

The xv6 file system provides data files, which are uninterpreted byte arrays, and directories, which contain named references to data files and other directories. The directories form a tree, starting at a special directory called the root.
mknod creates a file in the file system, but the file has no contents. Instead, the file's metadata marks it as a device file and records the major and minor device numbers, which uniquely identify a kernel device.
A file's name is distinct from the file itself; the same underlying file, called an inode, can have multiple names, called links. The link system call creates another file system name referring to the same inode as an existing file.
The unlink system call removes a name from the file system. The file's inode and the disk space holding it's content are only freed when the file's link count is zero and no file descriptors refer to it.
cd must change the current working directory of the sell itself.
Unix's combination of the "standard" file descriptors, pipes, and convenient shell syntax for operations on them was a major advance in writing general-purpose reusable programs. The idea sparked a whole culture of "software tools" that was responsible for much of Unix's power and popularity, and the shell was the first so-called "scripting language".
The Unix system call interface has been standardized through the Portable Operating System Interface (POSIX) standard.
Our main goals for xv6 are simplicity and clarity while providing a simple UNIX-like system call interface.
The authors of Unix went on to build Plan 9, which applied the "resources are files" concept to modern facilities, representing networks, graphics, and other resources as files or file trees.
Any operating system must multiplex processes onto the underlying hardware, isolate processes from each other, and provide mechanisms for controlled inter-process communication.
A key requirement for an operating system is to support several activities at once.
An operating system must fulfill three requirements: multiplexing, isolation, and interaction.
To achieve strong isolation it's helpful to forbid applications from directly accessing sensitive hardware resources, and instead to abstract the resources into services.
Unix transparently switches hardware processors among processes, saving and restoring register state as necessary, so that applications don't have to be aware of time sharing.
Unix processes use exec to build up their memory image, instead of directly interacting with physical memory. This allows the operating system to decide where to place a process in memory; if memory is tight, the operating system might even store some of a process's data on disk.
The Unix interface is not the only way to abstract resources, but it has proven to be a very good one.
Applications shouldn't be able to modify (or even read) the operating system's data structures or instructions, should be able to access other process's memory, etc.
Processors provide hardware support for strong isolation.
An application can execute only user-mode instructions and is said to be running in user space, while the software in kernel mode can also execute privileged instructions and is said to be running in kernel space. The software running in kernel space (or in kernel mode) is called the kernel.
Processors provide a special instruction that switched the processor from user mode to kernel mode and enters the kernel at an entry point specified by the kernel.
A key design question is what part of the operating system should run in kernel mode. One possibility is that the entire operating system resides in the kernel, so that the implementations of all system calls run in kernel mode. This organization is called a monolithic kernel.
In a monolithic kernel, a mistake is fatal, because an error in kernel mode will often result in the kernel to fail. If the kernel fails, the computer stops working, and thus all applications fail too. The computer must reboot to start again.
To reduce the risk of mistakes in the kernel, OS designers can minimize the amount of operating system code that runs in kernel mode, and execute the bulk of the operating system in user mode. This kernel organization is called a microkernel.
OS services running as processes are called servers.
To allow applications to interact with the file server, the kernel provides an inter-process communication mechanism to send messages from one user-mode process to another.
In a microkernel, the kernel interface consists of a few low-level functions for starting applications, sending messages, accessing device hardware, etc. This organization allows the kernel to be relatively simple, as most of the operating system resides in user-level servers.
The unit of isolation in xv6 is a process. The process abstraction prevents one process from wrecking or spying on another process' memory, CPU, file descriptors, etc. It also prevents a process from wrecking the kernel itself, so that a process can't subvert the kernel's isolation mechanisms.
The mechanisms used by the kernel to implement processes include the user/kernel mode flag, address spaces, and time-slicing of threads.
A process provides a program with what appears to be a private memory system, or address space, which other processes can not read or write. A process also provides the program with what appears to be its own CPU to execute the program's instructions.
xv6 uses page tables (which are implemented by hardware) to give each process its own address space. The x86 page table translates (or "maps") a virtual address (the address than an x86 instruction manipulates) to a physical address (an address that the processor chip sends to main memory).
xv6 maintains a separate page table for each process that defines that process's address space.
Each process's address space maps the kernel's instructions and data as well as the user program's memory. When a process invokes a system call, the system call executes in the kernel mappings of the process's address space. This arrangement exists so that the kernel's system call code can directly refer to user memory.
A process's most important pieces of kernel state are its page table, its kernel stack, and its run state.
Each process has a thread of execution (or thread for short) that executes the process's instructions.
Each process has two stacks: a user stack and a kernel stack.
The kernel stack is separate (and protected from user code) so that the kernel can execute even if a process has wrecked its user stack.
When a process makes a system call, the processor switched to the kernel stack, raises the hardware privilege level, and starts executing the kernel instructions that implement the system call. When the system call completes, the kernel returns to user space: the hardware lowers its privilege level, switches back to the user stack, and resumes executing user instructions just after the system call instruction.
The first step in providing strong isolation is setting up the kernel to run in its own address space.
The way that control transfers from user software to the kernel is via an interrupt mechanism, which is used by system calls, interrupts, and exceptions.
Page tables are the mechanism through which the operating system controls what memory addresses mean. They allow xv6 to multiplex the address spaces of different process onto a single physical memory, and to protect the memories of different processes.
As a reminder, x86 instructions (both user and kernel) manipulate virtual addresses. The machine's RAM, or physical memory, is indexed with physical addresses. The x86 page table hardware connects these two kinds of addresses, by mapping each virtual address to a physical address.
Each process has a separate page table, and xv6 tells the page table hardware to switch page tables when xv6 switches between processes.
The kernel must allocate and free physical memory at run-time for page tables, process user memory, kernel stacks, and pipe buffers.
To guard a stack growing off the stack page, xv6 places a guard page right below the stack. The guard page is not mapped and so if the stack runs off the stack page, the hardware will generate an exception because it cannot translate the faulting address.
sbrk is the system call for a process to shrink or grow its memory.
Exec is the system call that creates the user part of an address space.
When running a process, a CPU executes the normal processor loop: read an instruction, advance the program counter, execute the instruction, repeat. But there are events on which control from a user program must transfer back to the kernel instead of executing the next instruction. These events include a device signaling that it wants attention, a user program doing something illegal, or a user program asking the kernel for a service with a system call.
There are three cases when control must be transferred from a user program to the kernel.

First, a system call: when a user program asks for an operating system service.
Second, an exception: when a program performs an illegal action.
Third, an interrupt: when a device generates a signal to indicate that it needs attention from the operating system.

On the x86, interrupt handlers are defined in the interrupt descriptor table (IDT). The IDT has 256 entries, each giving the %cs and %rip to be used when handling the corresponding interrupt.
To make a system call on the x86, a program invokes the int n instruction, where n specifies the index into the IDT.
An operating system can use the iret instruction to return from an int instruction. It pops the saved values during the int instruction from the stack, and resumed execution at the saved %eip.
Devices on the motherboard can generate interrupts, and xv6 must set up the hardware to handle these interrupts.
Interrupts are usually optional in the sense that the kernel could instead periodically check (or "poll") the device hardware to check for new events. Interrupts are preferable to polling if the events are relatively rare, so that polling would waste CPU time.
Interrupts are similar to system calls, except devices generate them at any time.
A driver is the code in an operating system that manages a particular device: it tells the device hardware to perform operations, configures the device to generate interrupts when done, and handles the resulting interrupts. Driver code can be tricky to write because a driver executes concurrently with the device that it manages.
In many operating systems, the drivers together account for more code in the operating system than the core kernel.
A lock provides mutual exclusion, ensuring that only one CPU at a time can hold the lock. If a lock is associated with each shared data item, and the code always holds the associate lock when using a given item, then we can be sure that the item is used from only one CPU at a time. In this situation, we say that the lock protects the data item.
You must keep in mind that a single C statement can be several machine instructions and thus another processor or an interrupt may muck around in the middle of a C statement. You cannot assume that lines of code on the page are executed atomically. Concurrency makes reasoning about correctness much more difficult.
A race condition is a situation in which a memory location is accessed concurrently, and at least one access is a write. A race is often a sign of a bug, either a lost update or a read of an incompletely-updated data structure. The outcome of a race depends on the exact timing of the two CPUs involved and how their memory operations are ordered by the memory system, which can make race-induced errors difficult to reproduce and debug.
The usual way to avoid races is to use a lock. Locks ensure mutual exclusion, so that only one CPU can execute at a time.
Invariants are properties of data structures that are maintained across operations.
Interrupts can cause concurrency even on a single processor: if interrupts are enabled, kernel code can be stopped at any moment to run an interrupt handler instead.
Many compilers and processors execute code out of order to achieve higher performance. If an instruction takes many cycles to complete, a processor may want to issue the instruction early so that it can overlap with other instructions and avoid processor stalls.
It is best to use locks as the base for higher-level constructs like synchronized queues.
It is possible to implement locks without atomic instructions, but it is expensive, and most operating systems use atomic instructions.
Switching from one thread to another involves saving the old thread's CPU registers, and restoring the previously-saved registers of the new thread; the fact that %esp and %eip are saved and restored means that the CPU will switch stacks and switch what code it is executing.
The reason to enable interrupts periodically on an idling CPU is that there might be no RUNNABLE process because processes are waiting for I/O; if the scheduler left interrupts disabled all the time, the I/O would never arrive.
The purpose of a file system is to organize and store data. File systems typically support sharing of data among users and applications, as well as persistence so that data is still available after a reboot.
The file system needs on-disk data structures to represent the tree of named directories and files, to record the identities of the blocks that hold each file's content, and to record which areas of the disk are free.
One of the most interesting problems in file system design is crash recovery. The problem arises because many file system operations involve multiple writes to the disk, and a crash after a subset of the writes may leave the on-disk file system in an inconsistent state.
From our point of view, we can abstract the PC into three components: CPU, memory, and input/output (I/O) devices. The CPU performs computation, the memory contains instructions and data for that computation, and devices allow the CPU to interact with hardware for storage, communication, and other functions.
A computer's CPU runs a conceptually simple loop: it consults an address in a register called the program counter, reads a machine instruction from that address in memory, advances the program counter past the instruction, and executes the instruction. Repeat.
A register is a storage cell inside the processor itself, capable of holding a machine word-size value. Data stored in registers can typically be read or written quickly, in a single CPU cycle.
The modern x86 provides eight general purpose 32-bit registers--eax, ebx, ecx, edx, edi, esi, ebp, and esp--and a program counter eip.
Main memory is 10-100x slower than a register, but it is much cheaper, so there can be more of it. One reason main memory is relatively slow is that it is physically separate from the processor chip.
The cache memory serves as a middle ground between registers and memory both in access time and in size.

20180126

The little book about OS development by Erik Helin & Adam Renberg

The little book about OS development

We use C because developing an OS requires a very precise control of the generated code and direct access to memory.
Booting an operating system consists of transferring control along a chain of small programs, each one more "powerful" than the previous one, where the operating system is the last "program".
When the PC is turned on, the computer will start a small program that adheres to the Basic Input Output System (BIOS) standard. This program is usually stored on a read only memory chip on the motherboard of the PC.
Modern operating systems do not use the BIOS functions, they use drivers that interact directly with the hardware, bypassing the BIOS.
The BIOS program will transfer control of the PC to a program called a bootloader. The bootloader's task is to transfer control to us, the operating system developers, and our code.
The bootloader will transfer control to the operating system by jumping to a position in memory.
Assembly is very good for interacting with the CPU and enables maximum control over every aspect of the code. However, C is a much more convenient language to use. Therefore, we would like to use C as much as possible and use assembly code only where it makes sense.
One prerequisite for using C is a stack, since all non-trivial C programs use a stack. Setting up a stack is not harder than to make the esp register point to the end of an area of free memory that is correctly aligned.
When compiling the C code for the OS, a lot of flags to GCC need to be used. This is because the C code should not assume the presence of a standard library, since there is no standard library available for our OS.
There are usually two different ways to interact with the hardware, memory-mapped I/O and I/O ports.
If the hardware uses memory-mapped I/O then you can write to a specific memory address and the hardware will be updated with the new data.
If the hardware uses I/O ports then the assembly code instructions out and in must be used to communicate with the hardware.
The frame-buffer is a hardware device that is capable of displaying a buffer of memory on the screen.
When data is transmitted via the serial port it is placed in buffers, both when receiving and sending data. This way, if you send data to the serial port faster than it can send it over the wire, it will be buffered.
When paging is disabled, then the linear address space is mapped 1:1 onto the physical address space., and the physical memory can be accessed.
To enable segmentation you need to set up a table that describes each segment--a segment descriptor table.
The operating system must be able to handle interrupts in order to read information from the keyboard.
Interrupts are handled via the Interrupt Descriptor Table (IDT). The IDT describes a handler for each interrupt. The interrupts are numbered and the handler for interrupt i is defined at the ith position in the table.
When an interrupt occurs the CPU will push some information about the interrupt onto the stack, then look up the appropriate interrupt handler in the IDT and jump to it.
The interrupt handler has to be written in assembly code, since all registers that the interrupt handlers use must be preserved by pushing them onto the stack. This is because the code that was interrupted doesn't know about the interrupt and will therefore expect that its registers stay the same.
To start using hardware interrupts you must first configure the Programmable Interrupt Controller (PIC). The PIC makes it possible to map signals from the hardware to interrupts.
Every interrupt from the PIC has to be acknowledge--that is, sending a message to the PIC confirming that the interrupt has been handled. If this isn't done the PIC won't generate any more interrupts.
Virtual memory is an abstraction of physical memory. The purpose of virtual memory is generally to simplify application development and to let processes address more memory than what is actually physically present in the machine.
Managing memory is a big part of what an operating system does.
Segmentation translates a logical address into a linear address. Paging translates these linear addresses onto the physical address space, and determines access rights and how the memory should be cached.
Paging is the most common technique used in x86 to enable virtual memory.
The simplest kind of paging is when we map each virtual address onto the same physical address, called identity paging.
Preferably, the kernel should be placed at a very high virtual memory address.
Paging enables two things that are good for virtual memory. First, it allows for fine-grained access control to memory. Second, it creates the illusion of contiguous memory.
A virtual file system (VFS) creates an abstraction on top of the concrete file systems. A VFS mainly supplies the path system and file hierarchy, it delegates operations on files to the underlying file systems.
System calls is the way user-mode applications interact with the kernel--to ask for resources, request operations to be performed, etc. The system call API is the part of the kernel that is most exposed to the users, therefore its design requires some thought.
System calls are traditionally invoked with software interrupts. The user applications put the appropriate values in registers or on the stack and then initiates a pre-definied interrupt which transfers execution to the kernel. The interrupt number used is dependent on the kernel.
Creating new processes is usually down with two different system calls: fork and exec. fork creates an exact copy of the currently running process, while exec replaces the current process with one that is specified by a path to the location of a program in the file system.
The easiest way to achieve rapid switching between processes is if the processes themselves are responsible for the switching. The processes run for a while and then tell the OS (via a system call) that it can now switch to another process. Giving up the control of CPU to another process is called yielding scheduling, since all the processes must cooperate with each other.
Since cooperative scheduling is deterministic, it is much easier to debug than preemptive scheduling.

20180125

Think OS by Allen B. Downey

Think OS: A Brief Introduction to Operating Systems

In a statically-typed language, you can tell by looking at the program what type each variable refers to. In a dynamically-typed language, you don't always know the type of a variable until the program is running.
In general, "static" refers to things that happen at compile time, and "dynamic" refers to things that happen at run time.
The location of a variable is called its "address".
During parsing, the compiler reads the source code and builds an internal representation of the program, called an "abstract syntax tree".
The compiler reads the internal representation of the program and generates machine code or byte code.
It is usually a good idea to turn off optimization while you are developing new code. Once the program is working and passing appropriate tests, you can turn on optimization and confirm that the tests still pass.
An abstraction is a simplified representation of something complicated.
A large part of software engineering is designing abstractions that allow users and other programmers to use powerful and complicated systems without having to know about the details of their implementation.
An important kind of abstraction is virtualization, which is the process of creating a desirable illusion.
One of the most important principles of engineering is isolation: when you are designing a system with multiple components, it is usually a good idea to isolate them from each other so that a change in one component doesn't have undesired effects on other components.
One of the most important goals of an operating system is to isolate each running program from the others so that programmers don't have to think about every possible interaction.
A process is a software object that represents a running program.
Most operating systems have the ability to interrupt a running process at almost any time, save its hardware state, and then resume the process later.
Most operating systems create the illusion that each process has its own chunk of memory, isolated from all other processes.
"daemon" is used in the sense of a helpful spirit, with no connotation of evil.
A bit is a binary digit; it is also a unit of information. [...] In general, if you have b bits, you can indicate one of 2^b values.
In general, if the probability of the outcome is 1 in N, then the outcome contains log2N bits of information.
Intuitively, unexpected new carries a lot of information; conversely, if there is something you were already confident of, confirming it contributes only a small amount of information.
While a process is running, most of its data is held in "main memory", which is usually some kind of random access memory (RAM).
Each byte in main memory is specified by an integer "physical address". The set of valid physical addresses is called the physical "address space".
When a program reads and writes values in memory, it generates virtual addresses. The hardware, with help from the operating system, translates to physical addresses before accessing main memory. This translation is done on a per-process basis, so even if two processes generate the same virtual address, they would map to different locations in physical memory.
Virtual memory is one important way the operating system isolates processes from each other.
The data of a running process is organized into 4 segments:

The text segment contains the program text; that is, the machine language instructions that make up the program.
The static segment contains variables that are allocated by the compiler, including global variables and local variables that are declared static.
The stack segment contains the run-time stack, which is made up of stack frames. Each stack frame contains the parameters and local variables of a function.
The heap segment contains chunks of memory allocated at run time, usually by calling the C library function malloc.

Local variables on the stack are sometimes called "automatic", because they are allocated automatically when a function is called, and freed automatically when the function returns.
In C there is another kind of local variable, called "static", which is allocated in the static segment. It is initialized when the program starts and keeps its value from one function call to the next.
Most processors provide a memory management unit (MMU) that sits between the CPU and main memory. The MMU performs fast translation between VAs and PAs.
If a process doesn't use a virtual page, we don't need an entry in the page table for it.
Searching an associative table can be slow in software, but in hardware we can search the entire table in parallel, so associative arrays are often used to represent page tables in the MMU.
The fundamental idea is that page tables are sparse, so we have to choose a good implementation for sparse arrays.
A "file system" is a mapping from each file's name to its contents.
A "file" is a sequence of bytes.
The gap in performance between main memory and persistent storage is one of the major challenges of computer system design.
Sometimes the operating system can predict that a process will read a block and start loading it before it is requested.
If a process has used a block recently, it is likely to use it again soon. If the operating system keeps a copy of the block in memory, it can handle future requests at memory speed.
Performance techniques: block transfers, prefetching, buffering, caching.
A UNIX inode contains information about the file including: the user ID of the file owner, permission flags indicating who is allowed to read/write/execute it, and timestamps that indicate when it was last modified and accessed.
The file abstraction is really a "stream of bytes" abstraction, which turns out to be useful for many things, not just file systems.
Floating-point numbers are represented using the binary version of scientific notation.
Most computers use the IEEE standard for floating-point arithmetic.
There are two common uses of C unions. One is to access the binary representation of data. Another is to store heterogeneous data.
Memory management is one of the most challenging parts of designing large software systems, which is why most modern languages provide higher-level memory management features like garbage collection.
Memory errors can be difficult to find because the symptoms are unpredictable.
Safe memory management requires design and discipline.
Often there is a trade off between safe memory management and performance.
If you allocate a chunk of memory and never free it, that's a "memory leak".
Because memory management is so difficult, most large programs, like web browsers, leak memory.
When malloc allocates a chunk, it adds space at the beginning and end to store information about the chunk, including its size and the state (allocated or free).
Fragmentation wastes space; it also slows the program down by making memory caches less effective.
One of the fundamental problems of computer architecture is the "memory bottleneck".
A "cache" is a small, fast memory on the same chip as the CPU.
The tendency of a program to use the same data more than once is called "temporal locality". The tendency to use data in nearby locations is called "spatial locality".
Caches are fast because they are small and close to the CPU, which minimizes delays due to capacitance and signal propagation. If you make a cache big, it will be slower.
Most cache policies are based on the principle that history repeats itself; if we have information about the recent past, we can use it to predict the immediate future.
In an operating system, the kernel is the lowest level of software, surrounded by several other layers, including an interface called a "shell".
At its most basic, the kernel's job is to handle interrupts. An "interrupt" is an event that stops the normal instruction cycle and causes the flow of execution to jump to a special section of code called an "interrupt handler".
A hardware interrupt is caused when a device sends a signal to the CPU.
A software interrupt is caused by a running program.
When a program needs to access a hardware device, it makes a "system call", which is similar to a function call, except that instead of jumping to the beginning of the function, it executes a special instruction that triggers an interrupt, causing the flow of execution to jump to the kernel. The kernel reads the parameters of the system call, performs the requested operation, and then resumes the interrupted process.
In general, the kernel doesn't know which registers a process will use, so it has to save all of them.
In a multi-tasking system, each process is allowed to run for a short period of time called a "time slice" or "quantum".
When a process is created, the operating system allocates a data structure that contains information about the process, called a "process control block" or PCB.
Most schedulers use some form of priority-based scheduling, where each process has a priority that can be adjusted up or down over time. When the scheduler runs, it chooses the runnable process with the highest priority.
Simple scheduling policies are usually good enough.
Scheduling tasks to meet deadlines is called "real-time scheduling".
When you create a process, the operating system creates a new address space, which includes the text segment, static segment, and heap; it also creates a new "thread of execution", which includes the program counter and other hardware state, and the run-time stack.
A "mutex" is an object that guarantees a "mutual exclusion" for a block of code; that is, only one thread can execute the block at a time.
Many simple synchronization problems can be solved using mutexes.
A condition variable is a data structure associated with a condition; it allows threads to block until the condition becomes true.
There are some synchronization problems that can be solved simply with semaphores, yielding solutions that are more demonstrably correct.
One nice thing about using wrapper functions is that you can encapsulate the error checking code, which makes the code that uses these functions more readable.
Any problem that can be solved with semaphores can also be solved with condition variables and mutexes.

Pages

20180131

Game Progrmming Patterns

20180129

Effective Python by Brett Slatkin

20180128

Running Lean by Ash Maurya

20180127

xv6 by Russ Cox, Frans Kaashoek, and Robert Morris

20180126

The little book about OS development by Erik Helin & Adam Renberg

20180125

Think OS by Allen B. Downey