Justin Spencer: Cracking the Coding Interview by Gayle McDowell

Big-O time is the language and metric we use to describe the efficiency of algorithms. Not understanding it thoroughly can really hurt you in developing an algorithm.
In academia, Big-O describes an upper bound on the time.
Big-omega is the equivalent concept but for lower bounds.
Big-theta means both big-o and big-omega. That is, an algorithm is big-theta if it is both big-o and big-omega. Big-theta gives a tight bound on runtime.
We rarely ever discuss best case time complexity, because it’s not a very useful concept. After all, we could take essentially any algorithm, special case some input, and then get an O(1) time in the best case.
For many--probably most-algorithms, the worst and the expected case are the same.
Big O, big omega, and big theta describe the upper, lower, and tight bounds for the runtime.
Space complexity is a parallel concept to time complexity. If we need to create an array of size n, this will require O(n) space. If we need a two-dimensional array of size NxN, this will require O(n^2) space.
When you have a recursive function that makes multiple calls, the runtime will often (but not always) look like O(branches^depth), where branches is the number of times each recursive call branches.
Generally speaking, when you see an algorithm with multiple recursive calls, you’re looking at exponential runtime.
Here’s a list of the absolute, must-have knowledge:

Data Structures

Linked lists
Trees, tries, and graphs
Stacks & queues
Heaps
vectors/ArrayLists
Hash tables

Algorithms

Breadth-first search
Depth-first search
Binary search
Merge sort
Quick sort

Concepts

Bit manipulation
Memory (stack vs heap)
Recursion
Dynamic programming
big-O time & space

In particular, hash tables are an extremely important topic. Make sure you are very comfortable with this data structure.
Despite being possibly slow, a brute force algorithm is valuable to discuss. It’s a starting point for optimizations, and it helps you wrap your head around the problem.
Once you have a brute force algorithm, you should work on optimizing it.
Perhaps the most useful approach I’ve found for optimizing problems. “BUD” is a silly acronym for: bottlenecks, unnecessary work, duplicated work.
A bottleneck is a part of your algorithm that slows down the overall runtime.
Optimizing a bottleneck can make a big difference in your overall runtime.
Be particularly aware of any “optimizations” you intuitively or automatically made.
The best conceivable runtime is, literally, the best runtime you could conceive of a solution to a problem having. You can easily prove that there is no way you could be the BCR.
Best Conceivable Runtime is not a “real” algorithm concept, in that you won’t find it in algorithm textbooks. But I have found it personally very useful, when solving problems myself, as well as while coaching people through problem.s
One sign of a careful coder is that she doesn't make assumptions about the input. Instead, she validates that the input is what it should be, either through ASSERT statements or if-statements.
All else being equal, of course stability is a good thing. No one wants to be fired or laid off. However, all else isn’t actually equal. The more stable companies are also often growing more slowly.
A hash table is a data structure that maps keys to values for highly efficient lookup. There are a number of ways of implementing this.
In some languages, arrays (often called lists in this case) are automatically resizable. The array or list will grow as you append items. In other languages, like Java, arrays are fixed length. The size is defined when you create the array.
When you need an array-like data structure that offers dynamic resizing, you would usually use an ArrayList. An ArrayList is an array that resizes itself as needed while still providing O(1) access. A typical implementation is that when the array is full, the array doubles in size.
A linked list is a data structure that represent a sequence nodes. In a singly linked list, each node points to the next node in linked list. A doubly linked list gives each node pointers to both the next node and the previous node.
Unlike an array, a linked list does not provide constant time access to a particular index within the list. This means that if you'd like to find the K-th element in the list, you will need to iterate through K elements.
The benefit of a linked list is that you can add and remove items from the beginning of the list in constant time.
The “runner” technique means that you iterate through the linked list with two pointers simultaneously, with one ahead of the other. The “fast” node might be ahead by a fixed amount, or it might be hopping multiple nodes for each one node that the “slow” node iterates through.
A stack can also be used to implement a recursive algorithm iteratively.
A binary search tree is a binary tree in which every node fits a specific order property: all left descendants <= n < all right descendants. This must be true for each node n.
A trie (sometimes called a prefix tree) is a funny data structure. It comes up a lot in interview questions, but algorithm textbooks don’t speed much time on this data structure.
Bidirectional search is used to find the shortest path between a source and destination node. It operates by essentially running two simultaneous breadth-first searches, one from each node. When they search Coolidge, we have found a path.
The Sieve of Eratosthenes is a highly efficient way to generate a list of primes. It works by recognizing that all non-prime numbers are divisible by a prime number.
Be careful you don’t fall into a trap of constantly trying to find the “right” design pattern for a particular problem. You should create the design that works for that problem. In some cases it might be an established pattern, but in many other cases it is not.
The Singleton pattern ensures that a class has only one instance and ensures access to the instance through the application. It can be useful in cases where you have a “global” object with exactly one instance.
It should be noted that many people dislike the Singleton design pattern, even calling it an “anti-pattern”. One reason for this is that it can interfere with unit testing.
The Factory Method offers an interface for creating an instance of a class, with its subclasses deciding which class to instantiate. You might want to implement this with the creator class being abstract and not providing an implementation for the Factory method. Or, you could have the Creator class be a concrete class that provides an implementation for the Factory method.
While there are a large number of recursive problems, many follow similar patterns. A good hint that a problem is recursive is that it can be built off of subproblems.
Recursive algorithm can be very space inefficient. Each recursive call adds a new layer to the stack, which means that if your algorithm resources to a depth of n, it uses at least O(n) memory.
Dynamic programming is mostly just a matter of taking a recursive algorithm and finding the overlapping subproblems (that is, the repeated calls). You then cache those results for future recursive calls.
One of the simplest examples of dynamic programming is computing the nth Fibonacci number. A good way to approach such a problem is often to implement it as a normal recursive solution, and then add the caching part.
Drawing the recursive calls as a tree is a great way to figure out the runtime of a recursive algorithm.
A system can be scaled one of two ways:

Vertical scaling means increasing the resources of a specific node. For example, you might add additional memory to a server to improve its ability to handle load changes.
Horizontal scaling means increasing the number of nodes. For example, you might add additional servers, thus decreasing the load on any one server.

Typically, some frontend parts of a scalable website will be thrown behind a load balancer. This allows a system to distribute the load evenly so that one server doesn’t crash and take down the whole system. To do so, of course, you have to build out a network of cloned servers that all have essentially the same code and access to the same data.
Sharding means splitting the data across multiple machines while ensuring you have a way of figuring out which data is one which machine.
An in-memory cache can deliver very rapid results. It is a simply key-vale pairing and typically sits between your application layer and your data store.
Slow operations should ideally be done asynchronously. Otherwise, a user might get stuck waiting and waiting for a process to complete.
Some of the most important metrics around networking include:

Bandwidth: This is the maximum amount of data that can be transferred in a unit of time. It is typically expressed in bits per second.
Throughput: Whereas bandwidth is the maximum data that can be transferred in a unit of time, throughput is the actual amount of data that is transferred.
Latency: This is how long it takes data to go from one end to the other. That is, it is the delay between the sander sending information and the receiver receiving it.

MapReduce allows us to do a lot of processing in parallel, which makes processing huge amounts of data more scalable.
MapReduce is often associated with Google, but it’s used much more broadly than that. A MapReduce program is typically used to process large amounts of data.
Understanding the common sorting and searching algorithms is incredibly valuable, as many sorting and searching problems are tweaks of the well-known algorithms. A good approach is therefore to run through the different sorting algorithms and see if one applies particularly well.
Learning the common sorting algorithms is a great way to boost your performance. Of the five algorithms explained below, merge sort, quick sort, and bucket sort are the most commonly used in interviews.
No product is fail-proof, so analyzing failure conditions needs to be a part of your testing. A good discussion to have with your interviewer is about when it’s acceptable (or even necessary) for the product to fail, and what failure should mean.
All data members and methods are private by default in C++. One can modify this by introducing the keyword public.
The constructor of a class is automatically called upon an object’s creation. If no constructor is defined, the compiler automatically generates one called the Default Constructor. Alternatively, we can define our own constructor.
The destructor cleans up upon object deletion and is automatically called when an object is destroyed. It cannot take an argument as we don’t explicitly call a destructor.
Unlike pointers, references cannot be null and cannot be reassigned to another piece of memory.
Templates are a way of reusing code to apply the same class to different data types.
Normalized databases are designed to minimize redundancy, while denormalized databases are designed to optimize read time.
Threads within a given process share the same memory space, which is both a positive and negative. It enables threads to share data, which can be valuable. However, it also creates the opportunity for issues when two threads modify a resource at the same time.
For more granular control, we can utilize a lock. A lock (or monitor) issued to synchronize access to a shared resource by associating the resource with the lock. A thread gets to a shared resource by first acquiring the lock associated with the resource. At any given time, at most one thread can hold the lock and, therefore, only one thread can access the shared resource.

Justin Spencer

Pages

20210316

Cracking the Coding Interview by Gayle McDowell

No comments:

Post a Comment