Justin Spencer: "ALGORITHMS TO LIVE BY: THE COMPUTER SCIENCE OF HUMAN DECISIONS" by Brian Christian, Tom Griffiths

But an algorithm is just a finite sequence of steps used to solve a problem, and algorithms are much broader—and older by far—than the computer.
Long before algorithms were ever used by machines, they were used by people.
Algorithms have been a part of human technology ever since the Stone Age.
Applying the lens of computer science to everyday life has consequences at many scales. Most immediately, it offers us practical, concrete suggestions for how to solve specific problems.
At the next level, computer science gives us a vocabulary for understanding the deeper principles at play in each of these domains.
Examining cognition as a means of solving the fundamentally computational problems posed by our environment can utterly change the way we think about human rationality.
Instead, tackling real-world tasks requires being comfortable with chance, trading off time with accuracy, and using approximations.
Over the past decade or two, behavioral economics has told a very particular story about human beings: that we are irrational and error-prone, owing in large part to the buggy, idiosyncratic hardware of the brain.
Life is full of problems that are, quite simply, hard.
The 37% Rule* derives from optimal stopping’s most famous puzzle, which has come to be known as the “secretary problem.”
Instead, the optimal solution takes the form of what we’ll call the Look-Then-Leap Rule: You set a predetermined amount of time for “looking”—that is, exploring your options, gathering data—in which you categorically don’t choose anyone, no matter how impressive. After that point, you enter the “leap” phase, prepared to instantly commit to anyone who outshines the best applicant you saw in the look phase.
As the applicant pool grows, the exact place to draw the line between looking and leaping settles to 37% of the pool, yielding the 37% Rule: look at the first 37% of the applicants,* choosing none, then be ready to leap for anyone better than all those you’ve seen so far.
Every day we are constantly forced to make decisions between options that differ in a very specific dimension: do we try new things or stick with our favorite ones?
Simply put, exploration is gathering information, and exploitation is using the information you have to get a known good result.
The old adage tells us that “the grass is always greener on the other side of the fence,” but the math tells us why: the unknown has a chance of being better, even if we actually expect it to be no different, or if it’s just as likely to be worse.
Exploration in itself has value, since trying new things increases our chances of finding the best.
If the Gittins index is too complicated, or if you’re not in a situation well characterized by geometric discounting, then you have another option: focus on regret.
These regrets are often about the things we failed to do, the options we never tried.
Regret is the result of comparing what we actually did with what would have been best in hindsight.
Sorting is at the very heart of what computers do.
sorting is essential to working with almost any kind of information.
After all, one of the main reasons things get sorted is to be shown in useful form to human eyes, which means that sorting is also key to the human experience of information.
This is the first and most fundamental insight of sorting theory. Scale hurts.
Worst-case analysis lets us make hard guarantees: that a critical process will finish in time, that deadlines won’t be blown.
Computer science has developed a shorthand specifically for measuring algorithmic worst-case scenarios: it’s called “Big-O” notation.
Big-O notation has a particular quirk, which is that it’s inexact by design.
Big-O notation deliberately sheds fine details, what emerges is a schema for dividing problems into different broad classes.
Computer science, as undergraduates are taught, is all about tradeoffs.
The basic principle is this: the effort expended on sorting materials is just a preemptive strike against the effort it’ll take to search through them later.
Sorting something that you will never search is a complete waste; searching something you never sorted is merely inefficient.
Caching plays a critical role in the architecture of memory, and it underlies everything from the layout of processor chips at the millimeter scale to the geography of the global Internet.
The idea of keeping around pieces of information that you refer to frequently is so powerful that it is used in every aspect of computation.
As it explains, the goal of cache management is to minimize the number of times you can’t find what you’re looking for in the cache and must go to the slower main memory to find it; these are known as “page faults” or “cache misses.”
The optimal cache eviction policy—essentially by definition, Bélády wrote—is, when the cache is full, to evict whichever item we’ll need again the longest from now.
The LRU principle is effective because of something computer scientists call “temporal locality”: if a program has called for a particular piece of information once, it’s likely to do so again in the near future.
LRU teaches us that the next thing we can expect to need is the last one we needed, while the thing we’ll need after that is probably the second-most-recent one. And the last thing we can expect to need is the one we’ve already gone longest without.
This fundamental insight—that in-demand files should be stored near the location where they are used—also translates into purely physical environments.
Having a cache is efficient, but having multiple levels of caches—from smallest and fastest to largest and slowest—can be even better.
The left-side insertion rule, Noguchi specifies, has to be followed for old files as well as new ones: every time you pull out a file to use its contents, you must put it back as the leftmost file when you return it to the box. And when you search for a file, you always start from the left-hand side as well. The most recently accessed files are thus the fastest to find.
The Noguchi Filing System clearly saves time when you’re replacing something after you’re done using it.
Unavoidably, the larger a memory is, the more time it takes to search for and extract a piece of information from it.
Getting Things Done advocates a policy of immediately doing any task of two minutes or less as soon as it comes to mind. Rival bestseller Eat That Frog! advises beginning with the most difficult task and moving toward easier and easier things.
The Now Habit suggests first scheduling one’s social engagements and leisure time and then filling the gaps with work—rather than the other way around, as we so often do.
By having the shortest washing times at the start, and the shortest drying times at the end, you maximize the amount of overlap—when the washer and dryer are running simultaneously. Thus you can keep the total amount of time spent doing laundry to the absolute minimum. Johnson’s analysis had yielded scheduling’s first optimal algorithm: start with the lightest wash, end with the smallest hamper.
Thus we encounter the first lesson in single-machine scheduling literally before we even begin: make your goals explicit.
This is something of a theme in computer science: before you can have a plan, you must first choose a metric.
Minimizing the sum of completion times leads to a very simple optimal algorithm called Shortest Processing Time: always do the quickest task you can.
Shortest Processing Time gets things done.
Again, there’s no way to change the total amount of time your work will take you, but Shortest Processing Time may ease your mind by shrinking the number of outstanding tasks as quickly as possible.
A commitment to fastidiously doing the most important thing you can, if pursued in a head-down, myopic fashion, can lead to what looks for all the world like procrastination.
As business writer and coder Jason Fried says, “Feel like you can’t proceed until you have a bulletproof plan in place? Replace ‘plan’ with ‘guess’ and take it easy.” Scheduling theory bears this out.
Every time you switch tasks, you pay a price, known in computer science as a context switch.
Every context switch is wasted time.
This is thrashing: a system running full-tilt and accomplishing nothing at all.
Another way to avert thrashing before it starts is to learn the art of saying no.
The moral is that you should try to stay on a single task as long as possible without decreasing your responsiveness below the minimum acceptable limit.
This is the crux of Bayes’s argument. Reasoning forward from hypothetical pasts lays the foundation for us to then work backward to the most probable one.
The best way to make good predictions, as Bayes’s Rule shows us, is to be accurately informed about the things you’re predicting.
Once you know about overfitting, you see it everywhere.
Sam Altman, president of the startup incubator Y Combinator, echoes Jobs’s words of caution: “It really is true that the company will build whatever the CEO decides to measure.
Simply put, Cross-Validation means assessing not only how well a model fits the data it’s given, but how well it generalizes to data it hasn’t seen.
But as computer scientists have discovered over the past few decades, there are entire classes of problems where a perfect solution is essentially unreachable, no matter how fast we make our computers or how cleverly we program them.
It’s possible to quantify the difficulty of a problem. And some problems are just … hard.
One of the simplest forms of relaxation in computer science is known as Constraint Relaxation. In this technique, researchers remove some of the problem’s constraints and set about solving the problem they wish they had. Then, after they’ve made a certain amount of headway, they try to add the constraints back in. That is, they make the problem temporarily easier to handle before bringing it back to reality.
If you can’t solve the problem in front of you, solve an easier version of it—and then see if that solution offers you a starting point, or a beacon, in the full-blown problem.
Sometimes the best solution to a problem is to turn to chance rather than trying to fully reason out an answer.
Laplace’s proposal pointed to a profound general truth: when we want to know something about a complex quantity, we can estimate its value by sampling from it.
Today the Monte Carlo Method is one of the cornerstones of scientific computing.
Algorithms for finding prime numbers date back at least as far as ancient Greece, where mathematicians used a straightforward approach known as the Sieve of Erastothenes.
The Miller-Rabin primality test, as it’s now known, provides a way to quickly identify even gigantic prime numbers with an arbitrary degree of certainty.
Time and space are at the root of the most familiar tradeoffs in computer science, but recent work on randomized algorithms shows that there’s also another variable to consider: certainty.
When it comes to stimulating creativity, a common technique is introducing a random element, such as a word that people have to form associations with.
Wikipedia, for instance, offers a “Random article” link, and Tom has been using it as his browser’s default homepage for several years, seeing a randomly selected Wikipedia entry each time he opens a new window.
First, from Hill Climbing: even if you’re in the habit of sometimes acting on bad ideas, you should always act on good ones. Second, from the Metropolis Algorithm: your likelihood of following a bad idea should be inversely proportional to how bad an idea it is. Third, from Simulated Annealing: you should front-load randomness, rapidly cooling out of a totally random state, using ever less and less randomness as time goes on, lingering longest as you approach freezing. Temper yourself—literally.
The foundation of human connection is protocol—a shared convention of procedures and expectations, from handshakes and hellos to etiquette, politesse, and the full gamut of social norms.
In a packet-switched network, rather than using a dedicated channel for each connection, senders and receivers atomize their messages into tiny shards known as “packets,” and merge them into the communal flow of data—a bit like postcards moving at the speed of light.
In packet switching, on the other hand, the proliferation of paths in a growing network becomes a virtue: there are now that many more ways for data to flow, so the reliability of the network increases exponentially with its size.
Exponential Backoff has become the default way of handling almost all cases of networking failure or unreliability.
Exponential Backoff is also a critical part of networking security, when successive password failures in logging into an account are punished by an exponentially increasing lockout period. This prevents a hacker from using a “dictionary attack” against an account, cycling through potential password after password until eventually they get lucky.
In human society, we tend to adopt a policy of giving people some finite number of chances in a row, then giving up entirely. Three strikes, you’re out. This pattern prevails by default in almost any situation that requires forgiveness, lenience, or perseverance. Simply put, maybe we’re doing it wrong.
Essentially, AIMD takes the form of someone saying, “A little more, a little more, a little more, whoa, too much, cut way back, okay a little more, a little more…” Thus it leads to a characteristic bandwidth shape known as the “TCP sawtooth”—steady upward climbs punctuated by steep drops.
The lesson of the TCP sawtooth is that in an unpredictable and changing environment, pushing things to the point of failure is indeed sometimes the best (or the only) way to use all the resources to their fullest. What matters is making sure that the response to failure is both sharp and resilient.
As the saying goes, “the most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’ but ‘That’s funny.’”
When a networking buffer fills up, what typically happens is called Tail Drop: an unceremonious way of saying that every packet arriving after that point is simply rejected, and effectively deleted.
Dropped packets are the Internet’s primary feedback mechanism.
Smoothing out bursts is great if you are, on average, clearing things at least as quickly as they’re arriving—but if your average workload exceeds your average work rate, no buffer can work miracles.
One of the fundamental principles of buffers, be they for packets or patrons, is that they only work correctly when they are routinely zeroed out.
The feeling that one needs to look at everything on the Internet, or read all possible books, or see all possible shows, is bufferbloat.
We used to request dedicated circuits with others; now we send them packets and wait expectantly for ACKs. We used to reject; now we defer.
With their inherent latency, buffers are bad for most interactive processes.
Capacity does matter sometimes: for transferring large files, bandwidth is key.
For interhuman applications, however, a quick turnaround time is often far more important, and what we really need are more Concordes.
As Alan Turing proved in 1936, a computer program can never tell you for sure whether another program might end up calculating forever without end—except by simulating the operation of that program and thus potentially going off the deep end itself.
Here’s the problem. No matter what your accomplice does, it’s always better for you to defect.
A dominant strategy avoids recursion altogether, by being the best response to all of your opponent’s possible strategies—so you don’t even need to trouble yourself getting inside their head at all. A dominant strategy is a powerful thing.
Properly appreciating the mechanics of financial bubbles begins with understanding auctions.
Investors are said to fall into two broad camps: “fundamental” investors, who trade on what they perceive as the underlying value of a company, and “technical” investors, who trade on the fluctuations of the market.
Adopting a strategy that doesn’t require anticipating, predicting, reading into, or changing course because of the tactics of others is one way to cut the Gordian knot of recursion. And sometimes that strategy is not just easy—it’s optimal.
If changing strategies doesn’t help, you can try to change the game.
Seek out games where honesty is the dominant strategy. Then just be yourself.
First, there are cases where computer scientists and mathematicians have identified good algorithmic approaches that can simply be transferred over to human problems. The 37% Rule, the Least Recently Used criterion for handling overflowing caches, and the Upper Confidence Bound as a guide to exploration are all examples of this.
Second, knowing that you are using an optimal algorithm should be a relief even if you don’t get the results you were looking for.
Even the best strategy sometimes yields bad results—which is why computer scientists take care to distinguish between “process” and “outcome.”
A theme that came up again and again in our interviews with computer scientists was: sometimes “good enough” really is good enough.
One of the implicit principles of computer science, as odd as it may sound, is that computation is bad: the underlying directive of any good algorithm is to minimize the labor of thought.
The deeper point is that subtle changes in design can radically shift the kind of cognitive problem posed to human users.

Justin Spencer

Pages

20170325

"ALGORITHMS TO LIVE BY: THE COMPUTER SCIENCE OF HUMAN DECISIONS" by Brian Christian, Tom Griffiths

No comments:

Post a Comment