CodeFights Tutorials - https://goo.gl/vbGhW7

Google Doc - https://goo.gl/pKkMV2

Arrays

Interview Essentials

Arrays are one of the most basic data structures, but they make for some very hard interview questions! It’s imperative that you know the array methods in your chosen interviewing language very well, because interviewers will definitely be judging you on this.

An array is a data structure that contains a group of elements. The most basic implementation of an array is a static array, in which the array has a fixed size that must be declared when it is initialized. In technical interviews, you’re also going to come across questions about dynamic arrays (similar to static arrays, but with space reserved for additional elements) and multidimensional arrays (arrays in which the individual elements are also arrays).

Static Array

What Is a Static Array?

A random-access data structure that is of fixed size. "Adding" or "deleting" an element from an array requires creating an appropriately sized contiguous chunk of memory, and copying the elements that you want to keep into the new array.

Crucial Terms

Strengths and Weaknesses

Strengths
Weaknesses
Common Operations Cheat Sheet

Operation

Description

Time complexity

Mutates structure

append(item)

Adds item to the end of the list

O(n)

Yes

delete(index)

Removes item from list

O(n)

Yes

[index]

Retrieves element at index

O(1)

No

Dynamic Array

What Is a Dynamic Array?

A random-access, variable sized list data structure that allows elements to be added or removed. Like a regular array, but with reserved space for additional elements.

Crucial Terms

Reserved

Reserved

Reserved

A dynamic array with a physical size of 7 elements, logical size of 4 elements.

Strengths and Weaknesses

Strengths
Weaknesses

In Interviews, Use Dynamic Arrays When...

Common Operations Cheat Sheet

Operation

Description

Time complexity

Mutates structure

append(item)

Adds item to the end of the list

O(1) (amoritized)

Yes

delete(index)

Removes item from list

O(n) in general

Yes

delete(index)

Removes last item from list

O(1) from end of list

Yes

[index]

Retrieves element at index

O(1)

No

Multidimensional Arrays

What Is a Multidimensional Array?

A multidimensional array is a random-access data structure in which the data is accessed by more than one index. Some languages, such as C#, have built-in support for multidimensional arrays, while others support multiple indices by making an "array of arrays". A multidimensional array that needs N numbers to reference a piece of data is called a N-dimensional array, or ND array.

While multidimensional arrays of any size are possible, it's most important for you to be familiar with 2D arrays. You will often see 2D arrays used to implement:

Crucial Terms

Different languages have different internal representations of multidimensional arrays. When implementing a 2D array as an "array of arrays" (i.e. array[i] is itself an array), there is no technical reason why len(array[i]) == len(array[j]). If the arrays have different lengths, we call them jagged arrays. When discussing multidimensional arrays, we will mean regular (or rectangular) arrays, in which each dimension has a fixed length.

Strengths and Weaknesses

Strengths
Weaknesses

Common Operations Cheat Sheet

Operation

Description

Time complexity

Mutates structure

append(item)

Adds item to the end of the list

implem. dependent

Yes

delete(index)

Removes item from list

implem. dependent

Yes

[index1, index2,..,indexN]

Retrieves element at (index1, index2,..,indexN)

O(1)

No

size()

Returns the size of the array (d1, d2,..,dN)

O(1)

No

dim()

Returns the dimension of the array

O(1)

No

Linked Lists

Interview Essentials

A linked list is a sequential-access data structure used for storing ordered elements. They prioritize quick and easy data insertion and deletion over item lookup. All linked lists are collections of node data structures that are connected by pointers - that's where the "link" in "linked list" comes from.

Linked list questions are short and simple for an interviewer to ask, but they have complex solutions that really show off your problem-solving and coding skills. This means that they’re very common in technical interviews, so it’s important to be comfortable with using this data structure. You’ll use linked lists to solve questions in which having quick insertion and deletion is important, and when it’s not important to be able to have random access to elements.

Types of Linked Lists

We primarily see two types of linked lists:

Doubly linked lists require more space per node and their basic operations are more expensive than for singly linked lists. However, they are often easier to manipulate than singly linked lists because they allow for fast and easy sequential access to the list in both directions.

Essential Vocab

A node for a singly-linked list and a doubly-linked list

A node for a singly linked list and a doubly linked list

A figure of a linked list

A singly linked list with a size of 4

Strengths & Weaknesses

Linked lists are often compared to arrays, a data structure that provides random access to elements, meaning that their strengths and weaknesses are frequently compared to those of arrays as well.

Strengths
Weaknesses

In Interviews, Use a Linked List When...

Common Operations Cheat Sheet

Operation

Description

Time complexity

Mutates structure

find(value)

Returns a pointer to the node containing value in the data field, or NULL if it isn't in the list. This method can be used for membership checking as well.

O(N)

No

find(index)

Returns a pointer to the indexth node from the head pointer, or NULL if index is longer than the length of the list.

O(index)

No

addBeginning(value)

Adds a new node with a data field containing value to the beginning of the list

O(1)

Yes

addAfter(value, n)

Adds a new node with a data field containing value after node n

O(1)

Yes

remove(value)

Removes first instance of value from list

O(N) if we need to find the element.

Yes

removeNextElement(n)

Removes node after n

O(1)

Yes

removeThisElement(n)

Removes node n

O(1) on doubly linked-list

Yes

Hash Tables

Interview Essentials

Hash tables are a must-know data structure for technical interviews. This data structure is used to store an unorderedcollection of (key, value) pairs, where the key has to be unique. Item lookup by key, inserting a new item, or deleting an item are all fast operations (approximately O(1)).

Because they give you quick and cheap insertion, deletion, and lookups, you can use hash tables to solve many different types of interview questions. Hash tables are great for things like determining whether an element belongs to a collection, solving problems with invariants such as anagrams, and for solving problems where there is a unique, non-arbitrary identifier that you can use as a key (for instance, a person’s phone number or ID number).

A hash table is implemented with a function h(x), called a hash function or simply the hash. The hash function takes keys as inputs and returns a bucket index where the information is stored. The hash table becomes less efficient as the individual buckets store more information. A hash table is best suited to sparse problems, roughly where the actual number of keys used is smaller than the conceivable set of keys. For example, there are 1010 10-digit phone numbers, only a small fraction of which are actually in use.

Crucial Terms

Note: You don't have to worry about the following terms to simply use a hash table, but you do need to know them to implement one (or to answer interview questions about how they're implemented!).

Strengths & Weaknesses

Hash tables combine random access ability with quick insertion and deletion, which makes it an extremely flexible and useful data structure. If you’re doing something other than storing keys and values, or if you need to sort elements or iterate efficiently through them, though, you should use a different data structure.

Strengths

Note: These features assume the hashtable is sparse.

Weaknesses

In Interviews, Use Hash Tables When...

Common Operations Cheat Sheet

The O(...) times listed below are the answers typically expected by interviewers. This assumes that the none of the buckets has a large number of items it it. Items listed as O(1) are more accurately O(B), where B is the size of the bucket assigned to by the hash function.

Operation

Description

Time complexity

Mutates structure

hash[key] = value

Adding a new (key,value) pair to the hash table

O(1)

Yes

del hash[key]

Removes (key, value) from the hash table

O(1)

Yes

key in hash

Lookup whether key is in hash table

O(1)

No

Note: Lookup by value is not supported directly in a hash table. Instead, you'd iterate through all the keys and check for the value you were looking for (an O(n) procedure).

Trees: Basic

Interview Essentials

A tree is a data structure composed of parent and child nodes. Each node contains a value and a list of references to its child nodes. Trees borrow a lot of language from nature (each tree has a root node and leaf nodes) and from family trees (there are parent and child nodes).

Tree traversal problems are extremely common in technical interviews, so you need to be very familiar with how to do so. You’ll also run into situations in which you need to implement a tree yourself. Trees are particularly useful for storing sorted data that needs to be retrieved quickly, or for representing hierarchical data.

Crucial Terms

The tree is a collection of nodes that contain the data we want to store. In addition, the nodes have a collection of pointers to other nodes. The nodes referenced by the pointers inside X are referred to as the children of X.

In addition to terms describing the nodes in the tree, there are also some terms about the different parts of the tree.

There are many different types of trees that have their own terminology associated with them (such as Red-Black Trees, and AVL trees). Many of the different types of trees differ in their approach to balancing the tree. Balancing refers to inserting entries into a sorted tree in such a way that minimizes the tree's depth. While many of these trees are quite esoteric for an interview, you must know the terminology for a binary search tree (BST). A BST is a sorted tree with a branching factor of two (i.e. every node has at most two children, referred to as the left and right nodes due to the way they are drawn). A BST requires that the left node has a value smaller than its parent, while the right node has a value larger than its parent.

Tree Basics

For a collection of nodes to be a non-empty tree, they need to satisfy 4 conditions:

  1. There needs to be exactly one root node (i.e. one node with no parent).
  2. Each node that is not the root node has only one parent node (i.e. no node is a child of two or more nodes)
  3. You need to be able to reach each node by starting at the root and following a sequence of "child" pointers.
  4. A node at level x cannot have a child at a level less than x.
  5. (In fact, all children of a node at level x have a level of x+1, because children of a node at level x are one extra step from the root.)

Strengths & Weaknesses

Strengths
Weaknesses

In Interviews, Use Trees When...

Common Operations Cheat Sheet

The time complexity quoted below are specific to a balanced Binary Search Tree. When inserting and deleting node, the time taken for rebalancing the tree is also taken into account. Here n is the number of nodes in the tree.

Operation

Description

Time complexity

Mutates structure

insert(obj)

Adds obj into the BST, while maintaining a balanced tree.

O(log n)

Yes

delete(obj)

Finds and deletes obj from BST, while maintaining a balanced tree.

O(log n)

Yes

search(obj)

Determines if object is in tree

O(log n)

No

get(n)

Selects the nth highest object

O(log n)

No

rank(rand_obj)

Returns the number of nodes in the BST that are less than or rand_obj. rand_obj does not have to appear in the tree.

O(log n)

No

<iterator>.next()

Gets the next element, starting at current element

O(log n)

No

flatten()

Returns a list containing the nodes in sorted order

O(n)

Implementation dependent

Heaps, Stacks, Queues

Interview Essentials

Along with being comfortable with implementing heaps, stacks, and queues, it’s important to be clear on the the differences between these data structures. In interviews, it’s common to get questions like, “Explain the differences between a heap and a stack.”

Heaps

What are Heaps?

The purpose of a min-heap is to store objects that have a partial order on them. It has a fast (O(log n)) method for removing the minimum object from the heap. Min-heaps are useful for calculations where you have multiple minimum computations to perform.

There is an analogous structure called a max-heap, which extracts the maximum value from the heap.

A heap is either a max-heap or a min-heap - it can't be both. Given an implementation of a min-heap, a max-heap can be implemented by reversing the comparison between elements. We're only going to discuss min-heap in the rest of this overview.

Heaps are sometimes referred to as priority queues. Technically, heaps are actually just one implementation of a priority queue.

Crucial Terms

There are some implementation-specific details about heaps (such as the heap property and complete binary trees) that are useful for understanding how to build a heap from scratch, but they aren't that useful for using heaps to solve interview problems.

Strengths and Weaknesses

Strengths
Weaknesses

In Interviews, Use Heaps When ...

Heaps are designed to do one specific thing well, so the answer to when you should use a heap is repetitive: You use one when you have to do repeated minimum (or maximum) extractions. The problems where you would want to do this might look quite different from each other, however. Below we have selected some examples of common interview questions that benefit from heaps.

Common Operations Cheat Sheet

The operations described below are for a min-heap implemented using a full binary tree. There are obvious analogs for a max-heap.

Operation

Description

Time complexity

Mutates structure

insert(key)

Inserts key into heap. Can modify to have a key/value pair.

O(log n)

Yes

extract_min()

Returns and removes item with minimum key from the heap,

O(log n)

Yes

peek_min()

Can view minimum key, doesn't remove it from the heap

O(1)

No

heapify(list)

Creates a new heap from a list of n items. Functionally equivalent to starting with an empty heap and inserting elements one at a time, but has a better run time than the O(n log n) of this approach.

O(n)

N/A (creates heap)

There are other ways of constructing heaps (such as binomial heaps and Fibonacci heaps) that can improve the asymptotic run time slightly, but that support the same operations.


Stacks and Queues

What Are Stacks and Queues?

Stacks and queues are grouped together because they share many of the same properties. They are both data structures designed to hold elements and return them in a specific order. The difference between the data structures is the order in which items are retrieved.

Because of the metaphors used, we often talk about the "top" and "bottom" of a stack, but the "front" and "back" of a queue.

Stacks and queues are both abstract data types, which means that we are guaranteed to be able to put elements in and remove them in the specified order. You can build a stack or queue out of whatever data structures you like, such as arrays or linked lists, if you provide a way to add elements in the correct order. For example, in Python, the standard list acts like both an array and a stack.

Crucial Terms

Stacks and queues both have an add operation (push/enqueue) and a remove operation (pop/dequeue).

The name of the functions can differ from those shown above, depending on how you're implementing the stack or queue and the programming language you use. When using a Python list as a stack, for example, using L.append(item) is the push command, but Python does support a L.pop() command.

A slick implementation of an object that is both a stack and a queue is to use a doubly linked list, with a reference to the head and tail elements. This doubly linked list could implement all four functions above with O(1) time complexity.

Common Operations Cheat Sheet

Because stacks and queues are abstract data types, there is some debate whether time complexity is appropriate for the methods. The debate stems from questions of whether adding and removing from stacks and queues are the only thing they have to do (and time complexity is an implementation detail), or whether the time complexity is intrinsic to the definition of stacks and queues. For the table below, we have chosen the pragmatic option of including the time complexity as part of the definition, since your choice to use a stack or a queue over some other data structure is presumably because they support quick addition and removal of data in the order you want to access it.

If you are using a multipurpose data structure like Python's list (which is a stack and a queue), rather than an optimized queue or stack, you will probably have worse performance than indicated below.

Stack Operation

Description

Time complexity

Mutates structure

push(item)

Pushes item on the "top" of the stack

O(1)

Yes

pop()

Removes the most recently added item from the stack. (i.e. the item on "top")

O(1)

Yes

peek()

Accesses the most recently added item in the stack, without removing it

O(1)

No

isEmpty()

Returns True if there are no items on the stack, Falseotherwise

O(1)

No

Queue Operation

Description

Time complexity

Mutates structure

enque(item)

Puts item at the end of the queue

O(1)

Yes

deque()

Removes the item at the front of the queue

O(1)

Yes

peek()

Accesses the item at the front of the queue, without removing it

O(1)

No

isEmpty()

Returns True if there are no items in the queue, Falseotherwise

O(1)

No

Graphs

Interview Essentials

A graph is an abstract data structure composed of nodes (sometimes called vertices) and edges between nodes. Graphs are a useful way of demonstrating the relationship between different objects.

Some real-life examples of graphs:

Digraphs

In some graphs, the edges are "one way", which means that it's important to distinguish between the "source" or "beginning" of an edge, and the "target" or "end" of the edge. These are called "directed edges", and a graph containing vertices and directed edges are called a digraph ("directed graph").

Some examples of digraphs:

Graphs and Trees

If you've worked through either of the trees topics, you have already seen examples of graphs. Trees are special types of graphs - specifically, they are connected acyclic graphs with a single root node. Many problems that you have seen when using trees, such as using BFS or DFS, are also graph problems. The general graph versions tend to be a little trickier, because the graph can have cycles in it. Because there can be multiple paths between nodes, another common type of question is to find the "best" or "shortest" path between two nodes.

They are also useful ways of asking questions such as:

Crucial Terms

We will use the term graph to mean a graph with undirected edges, and digraph for a graph with directed edges.

There are a large number of definitions involving graphs, but most of them are reasonably obvious when looking at the graphical representation of a graph.

Digraph that is weakly, but not strongly, connected. Edges are (0,1), (1,2),(3,2)

Here it is not possible to get from node 1 to node 3, nor from node 3 to node 1. A digraph that is connected if you replace all the edges with undirected edges, such as the example above, is called weakly connected. A strongly connected digraph is a diagram where there is a directed path between any pair of vertices.

A graph with 3 connected components

The technical definition is that a connected component CC of GG is a subgraph that is connected, and for which there is no connected subgraph C^\primeC that contains CC as a subgraph. To find the connected components of a graph, note that each node belongs to exactly one connected component. Starting at any node, perform either depth-first or breadth-first search.

There are many more definitions used to discuss graphs, but often different sources will use slightly different definitions for the same term. Even professional mathematicians will ask clarifying questions when dealing with problems in graph theory to ensure they are using the same definitions as the interviewer is. When preparing for graph theory, focus on learning algorithms to solve common problems, rather than trying to learn all the different terms.

Common Representations of Graphs

There are two common representations of graphs. To keep things simple, we'll assume that the n nodes are ordered by integers from 0 to n-1, so we can use position in an array to determine the relevant node. If the nodes have other labels, we can use HashMaps or an external array mapping integers to labels.

  1. Adjacency-list representation
  2. We have a list-of-lists, connections, where connections[i] is a list of the node labels that the ith node is connected to. An edge [i,j] is represented by having j in connections[i]. In an undirected graph, j in connections[i] implies i in connections[j], for a digraph connections[i] contains all the nodes you can move to from the ith node.

If the graph is weighted, we can have connections[i] be the tuple (j, weight_ij).

  1. Adjacency-matrix representation
  2. We have a n\times nn×n matrix E. For an unweighted graph, E[i][j] is 1 if there is an edge [i,j] in the graph, and 0otherwise. For a weighted graph, E[i][j] is the weight of edge [i,j], with a reserved value to indicate no edge. This reserved value is often 0 or -1, but this will change depending on the application. It is essential that the value used to indicate "no edge" cannot actually occur for a real edge in the problem.

So which one should you use? It depends on your use case. If m denotes the number of edges and n denotes the number of vertices, then we know for connected graphs (n -1)\leq m \leq n(n-1)/2(n−1)≤mn(n−1)/2. For example, on the graph below:

4 nodes with all nodes except 0-3 connected

has the representations

The properties of the representations are summarized in the table below.

Representation

Lookup time

Memory usage

Adjacency List

O(m) (worst), O(m/n) average

O(m) (assuming connected)

Adjacency Matrix

O(1)

O(n2)

The adjacency list is never worse in memory than the adjacency matrix, and if the graph is sparse then it can be competitive in lookup time. If we drop the assumption that the graph is connected, then the memory usage of the adjacency list is O(n+m), as we need at least one array index per node in the graph.

For dense graphs, which structure is better for your problem depends on whether you are memory constrained, in which case you should choose an adjacency list, or if you will be time constrained with lookups, in which case you should choose an adjacency matrix. You can always choose to store edges in another datastructure, such as storing the edges as keys to a hash table or in a set. However, most problems will give you the graph encoded using either an adjacency list or an adjacency matrix.

Classic Graph Algorithms to Expect in Interviews

Finding the Shortest Path

There are two variants for shortest path problems.

In an unweighted graph, we want to find a path between two vertices using the smallest number of edges possible. The go-to algorithm is breadth-first search. For example, if you wanted to find the number of degrees of separation between "Kevin Bacon" and "Gong Li", using a graph that had actors as nodes and connected edges if they were in the same movie would be a shortest path problem that would use BFS. (At the time of writing, "Kevin Bacon" and "Gong Li" had two degrees of separation).

If the graph is weighted, the length of the path is defined as the sum of the weights on the edges, not the number of edges. If none of the weights are negative, the standard approach is to use Dijkstra's algorithm. Here we couple BFS with a min-heap to look for the shortest path.

An pseudo-code implementation of Dijkstra's algorithm is

#
# source = index of the source node
# edges  = representation of edges
# n      = number of nodes in graph
#
def shortest_path(source, edges, n){
 # make an array of distances, which are the distance
 # from the source node.
 for index from 0 to n-1 {
    dist[index] = inf
 }
 # the source node is 0 distance from itself
 dist[source] = 0

 # make a min-heap of unvisited nodes
 for index from 0 to n-1 {
   toVisit.add( (dist[index], index) )
 }

 while toVisit is not empty{
   # find the closest node to source so far. Because
   # no edge lengths are negative, there are no alternate
   # routes involving nodes further from source that give
   # a shorter path to this node

   # distance is stored in dist[] array, so discard.
   # keep index
   _, current = toVisit.extractMin();

   for each destination, edge_dist in edges[current]{
     # does a detour through the current lead to a
     # shorter distance?
     dist_to_node = min(dist[destination], dist[current] + edge_dist)
     if dist_to_node is not dist[destination]:
       # need to update the heap / priority queue.
       # Finds index and updates first element to
       # dist_to_node, and reimplements to the heap invariant
       # takes O(log n) time.
       toVisit.updateElements( (dist_to_node, index))
   }
 }

 return dist
}

Notice that Dijkstra returns the distances between the source node and all the other nodes in the graph. If you are only interested in the distance between two particular nodes, the loop can be interrupted once you reach the target node. If the graph isn't connected, the loop can be optimized by returning as soon as dist[current] = infinity, as at this point the algorithm has visited all the nodes that are connected to source.

If you need to find the distances between any pair of nodes, or to find the shortest path between two nodes if there are negative weights on edges, you should use a different algorithm such as Bellman-Ford.

Minimum Spanning Tree

A "spanning tree" of a connected component with n nodes is a subgraph that has no cycles and leaves the n nodes connected. Every spanning tree has n-1 edges; for a typical graph there are multiple spanning trees.

In a weighted graph, the weight of the spanning tree is the cost of all edges used in the tree. The tree with the lowest weight is called the "minimum spanning tree". A typical example of where a minimum spanning tree is useful is where the edges represent potential connections we could make between objects, and the weights are the cost of building those connections. A minimum spanning tree then gives a way of connecting all the objects for the smallest costs. While the cost of a minimum spanning tree is unique, the tree itself is generally not unique.

There are two well known algorithms for finding a minimum spanning tree: building node-by-node (Prim's algorithm), or edge-by-edge (Kruskal's algorithm).

Prim s algorithm in action

Most of the work in Prim's algorithm is finding the closest vertex to any node in the current solution. That is, we are looking for the lowest weight edge with one endpoint in the current solution, and the other endpoint not in the solution. An efficient approach is to maintain a min-heap of edges with exactly one endpoint in the solution. This slick use of a heap comes at the cost of having to do extra bookkeeping. Whenever a node v^\primev is absorbed into the solution, all edges between v^\primev and other nodes in the solution have to be removed from the heap, and all edges between v^\primev and edges not in the solution have to be added.

  1. Kruskal's algorithm

Kruskal s algorithm in action

Most of the work in Kruskal's algorithm comes from sorting the original edges, which is an O(m log m) operation. Since m \leq n^2mn2, \log m \leq 2 \log nlogm≤2logn, this is also an O(m log n) operation.

The other place we might be concerned about the amount of work is when we try to determine if adding an edge ee would create a cycle. Using a union-find data structure (also called a disjoint set), we can make this a very fast operation. The idea is that we are going to label vertices according to which connected component of the current solution they belong to. (They all belong to the same connected component of the complete graph!) In more detail:

Coloring a Graph

A coloring of a graph is a way of assigning a color to each node such that no two nodes that are connected by an edge have the same color.

A graph is kk-colorable if it can be colored with at most kk distinct colors. The chromatic number of a graph is the smallest number of colors needed to make a coloring of the graph.

Coloring problems are surprisingly difficult. To determine the chromatic number of an arbitrary graph is an NP-complete problem (in other words, it's really hard). Perhaps more surprising, answering the "yes/no" question of whether a graph is 3-colorable is also NP-complete! The simplest approach for coloring, at least in the types of graphs that are likely to come up as an interview task, is to use a backtracking algorithm for assigning the nodes different colors. If you're trying to determine the chromatic number of a graph, try determining if it's two colored using a trick, then try using backtracking with 3 colors, then 4, then 5, etc. until you have found a coloring that works.

The trick for determining if a graph is 2-colorable is by looking at its simple cycles (the paths that only repeat the first and last vertex). If all simple cycles have an even number of vertices, the graph is 2-colorable, otherwise it isn't. Technically a "graph" can be 1-colorable, but only if it has no edges.

The coloring of a graph is really just a labeling of its vertices. The term "coloring" originated from mapmakers trying to color maps. Different areas were represented as nodes, and an edge between two nodes meant they shared a border and shouldn't be the same color. As an example, the animation below shows the process of turning the map of the 48 contiguous United States into a graph, and how a coloring of the graph can be used to provide a coloring of the map.

A valid coloring of the 48 contiguous states.

The 4-color theorem states that any planar graph, that is, a graph that can be written on a piece of paper without any edges crossing, is 4-colorable. Any map we choose will produce a planar map, provided we ignore "borders" that consist of a single point (such as the 4-corners in the United States).

A lot of constraint problems can be expressed as graph coloring problems:

The tricky part about setting up problems as coloring problems is that we need to connect "things" in the problem that are not allowed to be together in the solution.

Matching

A matching on an undirected graph is a way of selecting edges so that every vertex is incident to at most one edge. One of the most common matching problems is also the simplest: looking for matches when the nodes have two different types, and only nodes of different types can be joined. These are called bipartite graphs, and can be technically defined as graphs that are 2-colorable.

A famous example is making matchings between medical students who are looking for residencies, and the positions available at hospitals. Each resident can be assigned to only one available position, and each position can be filled by at most one resident. If we create nodes that are the positions and the residents, and an edge between a "resident" node and a "position" node means "this resident is assigned to that position", a valid placement of residents into positions is a matching on this graph. This is similar to the problem of job applicants applying to positions.

Generally, not all matches are created equal. On a weighted graph, you might be trying to select the edges in the graph that make a matching with the highest weight. In the resident/hospital application problem, residents have preferences about which hospitals they want to work at, and hospitals will have preferences about the residents they want to work there. In this case, you might have some measure of how to get everyone as close to their first choice as possible. A stable match is one where neither party is tempted to "defect" for a better offer once they've accepted the position.

In Interviews, Use Graphs When ...

In some cases, the question will be explicitly about graphs, such as asking how many connected components are in a graph given its adjacency matrix. If given a map of cities, and the roads between them along with their distances, and you are asked to find the shortest path between city A and city B, you would probably be thinking about Dijkstra's algorithm.

In other problems, the connection is more subtle. It isn't immediately obvious that the question of how many dinner tables you need to seat people, if certain people cannot be seated at the same table, is a graph coloring problem in disguise. Strictly speaking, it's an NP-complete graph theory problem in disguise! A direct method of approaching this problem, with backtracking, might be equivalent to the way you solve the graph theory coloring problem. So what does knowledge of graph theory get you in this instance? Since you know already that the problem is hard, you can relax a little when you start writing your backtracking algorithm when you realize that it isn't going to be polynomial time (because you know solving this problem is as hard as solving an NP-complete problem). Or maybe you know that if you want a reasonable running time, you are going to need to memoize some intermediate results.

The big value in graph theory comes in finding the relationship between problems, as well as knowing the best approaches for those classes of problem.

Depth-First Search & Breadth-First Search

Interview Essentials

Depth-first search (DFS) and breadth-first search (BFS) are common algorithms for searching trees or graphs.

In DFS, you start at the root of the tree (or a selected node if it’s a graph) and search as far as possible along each branch before backtracking. DFS is great for searching trees that are deeper than they are wide. In technical interviews, DFS is good for solving puzzles that have only one solution (like mazes).

In BFS, on the other hand, you start at the root of the tree (or a selected node in a graph) and search each subsequent level of nodes before moving on to the next level. BFS is great for searching trees that are wider than they are deep. In an interview, BFS is a great option for problems about file system traversal, printing hierarchies in order, and finding the shortest path between two nodes in a graph.

DFS and BFS are both methods of searching a tree or graph for nodes with particular properties. Both methods allow the reconstruction of a path from the starting node to the solution node(s).

Because they both do the same thing, it can be difficult to decide whether DFS or BFS is the better approach. For problems where we are looking for solutions that are close to the starting node, BFS is a better choice since it searches the closest nodes first. For many problems, the approach depends on the structure of the tree or graph.

Because trees and graphs can contain a lot of nodes, the memory requirements may guide whether you use DFS or BFS. For trees, BFS requires you to store all the nodes at a given level, while DFS requires you to store an entire path from root to leaf. Because trees are often balanced, i.e. structured in a way to minimize depth, DFS will generally require less memory. However, some decision trees and some generated trees can have branches that are infinitely long, in which case BFS would be a better choice.

A full binary search tree with depth d will require O(log d) memory for DFS but O(2d) memory for BFS.

Important Concepts

Related Concepts

Generalized Algorithms

Our goal is to find all nodes that satisfy some condition. We will refer to these nodes as solutions. Here we keep track of all the visited nodes, which can use a substantial amount of memory. The array visited is there to prevent the code running in an infinite loop, so if we know we have a tree (or more generally any acyclic graph) we don't need visited.

Recursive DFS

function DFS_recur(node current , set solutions, set visited):
   add current onto visited

   if current is a solution:
       add current to solutions

   for each edge from current:
       new_state <- state obtained from current by following edge
       if new_state not in visited:
         DFS_recur( new_state, solutions, visited )

If the nodes can be altered, it is often useful to store the "visited" status in the nodes instead of in a separate data-structure. Then looking up whether new_state has been visited requires looking at the appropriate field of new_state, rather than searching through the visited data structure.

For a tree, or other acyclic graph, the recursive algorithm for DFS is simpler and requires less memory:

// Only use this version for acyclic graphs!
function DFS_recur(node current , set solutions ):
   if current is a solution:
       add current to solutions

   for each edge from current:
       new_state <- state obtained from current by following edge
       DFS_recur( new_state, solutions )

In some languages, there is a limit to how many nested recursive calls you are allowed to make. Exceeding this limit raises a stack overflow error. (Python is notorious for defaulting to a low number of nested recursive calls.) Behind the scenes, calling a function recursively pushes all your variables to the call stack. We can get around this limit by implementing our own stack in an iterative version.

BFS and Iterative DFS

BFS is usually implemented using a queue (First In, First Out) to store nodes, and looks quite different from our recursive DFS. By writing an iterative version of DFS using an explicit stack (First In, Last Out) to store nodes, we can see that they're quite similar.

function BFS_iter(node start_node):
   create empty set visited
   create empty queue Q
   create empty set solutions

   add start_node to visited
   Q.enque(start_node)

   while Q is not empty:
     current = Q.deque()
     if current is a solution:
       add current to solutions

     for each edge from current:
       new_state <- state obtained from current by following edge
       if new_state not in visited:
         add new_state to visited
         Q.enque(new_state)
   return solutions

Compare this to an iterative version of DFS:

function DFS_iter(node start_node):
 create empty set visited
 create empty stack Stk
 create empty set solutions

 add start_node to visited
 Stk.push(start_node)

 while Stk is not empty:
   current = Stk.pop()
   if current is a solution:
     add current to solutions

   for each edge from current:
     new_state <- state obtained from current by following edge
     if new_state not in visited:
       add new_state to visited
       Stk.push(new_state)
   return solutions

We see that other than the names of the methods for adding and removing elements (push and pop vs enque and deque), the algorithms are almost exactly the same.

If we have a tree or other acyclic graph, the code can be simplified more by dropping all the lines that refer to the visitedset.

Pathfinding

For some problems, such as finding a path through the maze, finding the node (the exit) isn't that interesting. Instead, we're looking for the path to the desired node.

An easy modification that allows a reconstruction of a path is to make the visited elements contain two pieces of information: the node that was visited, and where it was visited from. Once you have the desired node, you can follow the from nodes back to the start to recover the path.

For An Interview, Expect To See Problems Like...

Backtracking

Interview Essentials

Backtracking is a technique used to build up to a solution to a problem incrementally. These "partial solutions" can be phrased in terms of a sequence of decisions. Once it is determined that a "partial solution" is not viable (i.e. no subsequent decision can convert it into a solution) then the backtracking algorithm retraces its step to the last viable "partial solution" and tries again.

Visualizing the decisions as a tree, backtracking has many of the same properties of depth-first search. The difference is that depth-first search will guarantee that it visits every node until it finds a solution, whereas backtracking doesn't visit branches that are not viable.

Because backtracking breaks problems into smaller subproblems, it is often combined with dynamic programming, or a divide-and-conquer approach.

Common interview tasks that use backtracking are crossword puzzles, Sudoku solvers, splitting strings, and cryptarithmetic puzzles.

Important Concepts

Related Concepts

Here are some concepts that will help you implement a backtracking algorithm.

Generalized Algorithm

In the algorithms below, it is assumed that each decision is irreversible, so there is only one path to each state. If modeling a game like chess, you would have to be more careful, as it is possible to arrive at the same state multiple different ways. If it is possible to reach each state multiple ways, then we would also need to keep track of which states we had already visited.

A recursive implementation of a backtracking algorithm takes the general form

function doBacktrack( current ):
   if current is a solution:
       return current
   for each decision d from current:
       new_state <- state obtained from current by making decision d
       if new_state is viable:
           sol <- doBacktrack(new_state)
           if sol is not None:
               return sol
   # indicate there is no solution
   return None    

In some languages, there is a limit to how many nested recursive calls you are allowed to make. Exceeding this limit raises a stack overflow error. (Python is notorious for defaulting to a low number of nested recursive calls.) Behind the scenes, calling a function recursively pushes all your variables to the call stack. We can get around this limit by implementing our own stack.

function doBacktrackIterative(start):
   stack <- initialize a stack
   stack.push(start)

   while (stack not empty):
       current = stack.pop()

       if current is a solution:
           return current

       for each decision d from current:
           new_state <- state obtained from current by making decision d
           if new_state is viable:
               stack.push(new_state)

   return None

For An Interview, Expect To See Problems Like...

  1. Placing n queens on an n x n chess board so no queen can be taken
  2. We know that we need one queen in every row, and one queen in every column.

Here are the states that the backtracking algorithm explores for a 4x4 board. Note that naively there are 4! = 24 complete configurations with 4 queens on the board, if we place the queens row-by-row, and avoid reusing a column that is already used. With backtracking, many configurations are eliminated early. This algorithm shows all solutions, instead of stopping at the first one found.

Backtracking for 4 queens on a 4x4 board

  1. Finding a subset of a set of positive integers S that sum to a particular number
  2. We have a set of numbers S = {10, 7, 5, 18, 12, 20, 8}, and a target value T = 23. The goal is to find a subset of s of Swhose elements sum to the target value, or report that such a sum doesn't exist.
  1. Finding the best possible move in a game
  2. A generic approach for finding the best strategy (sequence of moves) in a game is:

Sorting

Interview Essentials

Sorting algorithms are used to put the elements of an array in a certain order, generally using a comparison function to determine order (a generalization of less than). Some comparison functions will allow for ties between different objects. A stable sorting algorithm will guarantee that if two elements "tie", then whichever element occured first in the original array will also occur first in the sorted array. An unstable sorting algorithm will only guarantee that the final elements are in order, and makes no guarantee about the order of "tied" elements.

Being asked to implement a sorting algorithm from scratch used to be a frequently asked interview question, but this is becoming less popular. Just in case, though, it's worth knowing how to implement two of the more useful sorting algorithms:

For other sorting algorithms, you might be asked about their time complexity, their space complexity, and whether or not the sort is stable. In decreasing order of probability of coming up in an interview, the other common sorts are:

Quick Sort

Important Concepts

Time Complexity
Space Complexity

Generalized Algorithm

  1. Pick a pivot element from the array.
  2. Reorder the array so that all the elements with values less than the pivot now come before the pivot, while all elements with values greater than the pivot now come after it. Equal values can go either before or after. After this partitioning operation, the pivot element will be in its final position.
  3. Recursively apply these steps to the sub-array of elements with smaller values.
  4. Recursively apply these steps to the sub-array of elements with greater values.
  5. The base case of the recursive algorithm is arrays with a size 0 or 1, which don't need to be sorted.

Merge Sort

Important Concepts

Time Complexity

Space Complexity

Generalized Algorithm

Merge sort is a recursive algorithm. Suppose the function is called merge(L), which returns the sorted array.

  1. Given an array, if its length is bigger than 1, split it into two pieces left and right. (The pieces should be as close to evenly sized as possible). If the array has length 1 or zero, then it is automatically sorted, so just return that trivial array.
  2. Call store mL = merge(left) and mR = merge(right). This will give you a sorted copy of the left and right half of the array.
  3. Start with a reference to the beginning of mL and mR, called ref_L and ref_R respectively. Build up a new array ans that merges the contents of mL and mR by taking the smaller of ref_L and ref_R, and updating the reference taken to the next element. This merge guarantees that ans is in order from smallest to greatest.
  4. Return the merged array ans.

Overview of the Rarer Sorting Algorithms

Insertion Sort

An in-place sorting algorithm.

  1. Start with the index current at 0. (Insertion sort ensures that all elements up to and including the element at index current are in sorted order, which is trivial for the first element).
  2. All elements before current are sorted, elements after current are unsorted. Get the first element after current.
  3. Find where the next element fits in the sorted portion of the array.
  4. Now the sorted portion of the array is one index larger. Move the current index one place further toward the end.

Repeat steps 2 - 4 until the current index is at the end. The resulting array is completely sorted (all elements occur before current).

Insertion sort does well in cases where the array is already partially sorted, as it can "move past" elements that are already in order.

Heap Sort

An in-place sorting algorithm.

  1. Pass through the array one element at a time, converting the subarray before the current index into a min-heap.
  2. Once the entire array has been converted into a min-heap, use the heap's extract-min method, and place the retrieved element at the end of a new array. Since each element is the minimum of the heap at the time of extraction, this process ends with a sorted array.
Selection Sort

An in-place sorting algorithm.

  1. Start with the index current at 0. (Selection sort maintains the invariant that the first (current-1) smallest elements appear in sorted order at the beginning of the array. This invariant is trivial when current is zero.)
  2. Scan the subarray of the elements from current to the end of the list. Select the smallest element in the subarray.
  3. Swap the element at current with the smallest element found in step 2.
  4. The first current smallest elements are at the beginning of the array. Move the index current forward one place.

Repeat steps 2 - 4 until the current index is at the end of the array. The resulting array is sorted.

A lot of the time in selection sort is spent finding the minimum element in the unsorted portion of the array. Heap sort (discussed above) is essentially a selection sort in which the heap is used to speed up the repeated selection of the minimum with extract-min.

Bubble Sort

An in-place sorting algorithm. Bubble sort is a very poor sorting algorithm; its use in interviews is limited to talking about why it's a poor choice. Here is how bubble sort would operate on an array a:

  1. Start by looking at the first two elements, a[0] and a[1]. If they are out of order, swap them.
  2. Then compare the two elements, a[1] and a[2]. If they are out of order, swap them.
  3. Iterate this procedure, putting items a[i] and a[i+1] in the correct order, until you reach the end. This process of going through the array with up to n-1 swaps is called a pass.
  4. Repeat this process until the array is fully sorted, which can take at most n passes.

Head-to-Head Table of Comparison-Based Sorting Algorithms

If only one entry is shown for complexity, it means that average and worst-case complexity scale the same way.

Sorting algorithm

Time Complexity (Avg / Worst)

Additional Space Complexity (Avg / Worst)

Deterministic?

Stable?

Quick Sort

O(n log n)/O(n2)

O(log n)

No

No

Merge Sort

O(n log n)

O(n)

Yes

Yes

Insertion Sort

O(n2) (*)

In-place (O(1))

Yes

Yes

Heap Sort

O(n log n)

O(1)

Yes

No

Bubble Sort

O(n2)

In-place (O(1))

Yes

Yes

Selection Sort

O(n2)

In-place (O(1))

Yes

Implem. dependent

(*) Insertion sort is fast if the input is already partially sorted. If no item is more than k positions from its final position, the runtime is O(nk).

Example of different sorts

We show the sequence of swaps the different swapping algorithms make to put the array [6,5,3,1,8,7,2,4] in order.

For insertion and bubble sort, we only included steps where the list changed, to reduce the number of steps shown. Merge sort was excluded because it isn't a simple swap, and quick sort was excluded because it isn't deterministic.

Swap #

Selection Sort

Insertion Sort

Bubble Sort

0

[6, 5, 3, 1, 8, 7, 2, 4]

[6, 5, 3, 1, 8, 7, 2, 4]

[6, 5, 3, 1, 8, 7, 2, 4]

1

[1, 5, 3, 6, 8, 7, 2, 4]

[5, 6, 3, 1, 8, 7, 2, 4]

[5, 6, 3, 1, 8, 7, 2, 4]

2

[1, 2, 3, 6, 8, 7, 5, 4]

[5, 3, 6, 1, 8, 7, 2, 4]

[5, 3, 6, 1, 8, 7, 2, 4]

3

[1, 2, 3, 6, 8, 7, 5, 4]

[3, 5, 6, 1, 8, 7, 2, 4]

[5, 3, 1, 6, 8, 7, 2, 4]

4

[1, 2, 3, 4, 8, 7, 5, 6]

[3, 5, 1, 6, 8, 7, 2, 4]

[5, 3, 1, 6, 7, 8, 2, 4]

5

[1, 2, 3, 4, 5, 7, 8, 6]

[3, 1, 5, 6, 8, 7, 2, 4]

[5, 3, 1, 6, 7, 2, 8, 4]

6

[1, 2, 3, 4, 5, 6, 8, 7]

[1, 3, 5, 6, 8, 7, 2, 4]

[5, 3, 1, 6, 7, 2, 4, 8]

7

[1, 2, 3, 4, 5, 6, 7, 8]

[1, 3, 5, 6, 7, 8, 2, 4]

[3, 5, 1, 6, 7, 2, 4, 8]

8

[1, 3, 5, 6, 7, 2, 8, 4]

[3, 1, 5, 6, 7, 2, 4, 8]

9

[1, 3, 5, 6, 2, 7, 8, 4]

[3, 1, 5, 6, 2, 7, 4, 8]

10

[1, 3, 5, 2, 6, 7, 8, 4]

[3, 1, 5, 6, 2, 4, 7, 8]

11

[1, 3, 2, 5, 6, 7, 8, 4]

[1, 3, 5, 6, 2, 4, 7, 8]

12

[1, 2, 3, 5, 6, 7, 8, 4]

[1, 3, 5, 2, 6, 4, 7, 8]

13

[1, 2, 3, 5, 6, 7, 4, 8]

[1, 3, 5, 2, 4, 6, 7, 8]

14

[1, 2, 3, 5, 6, 4, 7, 8]

[1, 3, 2, 5, 4, 6, 7, 8]

15

[1, 2, 3, 5, 4, 6, 7, 8]

[1, 3, 2, 4, 5, 6, 7, 8]

16

[1, 2, 3, 4, 5, 6, 7, 8]

[1, 2, 3, 4, 5, 6, 7, 8]

Dynamic Programming: Basic

Interview Essentials

Dynamic programming is a powerful technique that helps you solve complex problems by breaking them down into simpler recurring subproblems. This is generally done by taking a recursive algorithm and finding the repeated calls, the results of which you can then store for future recursive calls. Since you've compute the answer for an input and saved it for later, you don’t have to recompute it the next time around!

In technical interviews, dynamic programming is especially useful when you’re solving problems about optimization, search, and counting. If, as you're analyzing a problem, you see that it can be broken down into smaller overlapping subproblems, it's a good signal that you can employ dynamic programming to solve it.

Important Concepts

There are two common dynamic programming strategies that you'll need to familiarize yourself with, bottom-up and top-down.

Generalized Algorithm

  1. Find a recursive or iterative solution to the problem first (although usually these will have bad run times).
  2. This helps you identify the subproblems, which are “smaller” instances of the original problem.
  3. Determine whether the subproblems are overlapping. If they’re overlapping, a recursive algorithm will be solving the same subproblem over and over. (If they’re not overlapping and the recursive algorithm generates new subproblems each time it runs, DP isn’t a good solution; try a divide-and-conquer solution like merge sort or quick sort instead.)
  4. Store/cache the results for every subproblem.
  5. Once the subproblems have solutions, the original problem is easy to solve!

In Interviews, Expect to See Problems Like...

Common Techniques: Basic

Interview Essentials

There are some common problem-solving techniques that you should keep in your back pocket during technical interviews. They’ll come in handy for a lot of different questions, and interviewers really want to see that you know them! Three of the most common problem-solving techniques you'll need to employ during interviews are:

Binary Search

Important Concepts

Time complexity
Space Complexity

Generalized Algorithm

  1. Choose an element in the middle. Is that the one you’re looking for? Great! Otherwise, determine whether the element you chose is smaller or greater than the one you’re looking for.
  2. If the element you’re looking for is smaller than the one you chose, you know that you don’t need to worry about any elements to the right of your initial choice. Alternately, if the element you’re looking for is larger than the one you chose, you know that you don’t need to worry about any elements to the left of your initial choice. Either way, you’ve just halved the amount of elements that you need to look at.
  3. Do the same thing again and again, until you’ve found the element you’re looking for!
Pseudocode

def binary_search(l, value):
   # Set a min and a max (the 0th element and the n-1 element)
   min = 0
   max = len(l)-1
   # Check to see if guess is what you’re looking for. If so, stop! You’re done!
   while min <= max:
       # Set a guess as the median of min and max. If the answer isn’t an integer, round down.
       guess = (min + max)//2
       # If your guess was smaller than the target number, you’ll reset your max to guess - 1.
       if l[guess] > value:
           max = guess - 1
       # If your guess was larger than the target number, you’ll reset your min to guess + 1.    
       elif l[guess] < value:
           min = guess + 1
       # Compute a new guess, as in step 2, and start again!    
       else:
           return guess
   return -1

In Interviews, Expect to See Problems Like...


Two Pointers

Two Pointers techniques come up frequently when manipulating singly-linked lists. Having pointers to two different nodes can be used to overcome the restriction that you can only traverse a singly-linked list in one direction.

Important Concepts

There are two main strategies for this technique:

Time Complexity

In Interviews, Expect to See Problems Like...


Prefix Sums

Prefix sums are a method of pre-computation that will allow us to calculate the sum of any contiguous block of an array in O(1) time! The time to compute the prefix sum initial is O(n).

Generalized Algorithm

To create the prefix sum array

In general, prefix sum of an array x is an array y that can be calculated as:

yi = yi − 1 + xi-1 for i > 0, and y0 = 0

The array y will have one more element than x.

To use the prefix sum array

To get the sum of the elements xa + xa+1 + ... + xb we would calculate yb+1 - ya.

Note: Some sources may not use the leading 0 at the beginning of the prefix sum array. The convention used here wastes a little bit of space, but simplifies some of the bounds checking when using the array.

Time Complexity
Space Complexity

Given a prefix sum array, it is possible to recover the original array by looking at the difference between neighboring elements. This means we can choose to overwrite the original array with the prefix array, or we can choose to make a separate array.

In Interviews, Expect to See Problems Like...

Strings

Interview Essentials

The string data type is used to represent text, and string values must be enclosed in either single or double quotation marks. Even though strings themselves are a basic data type, interviewers come up with questions that are anything but simple! Make sure that you are very familiar with string methods for your chosen interviewing language. In most languages, strings are implemented as character arrays, which means that string questions and array questions are interchangeable in a lot of cases.

Bits

Interview Essentials

Binary digits, or bits, are variables that can hold two values: 0 or 1. Bits are grouped together in bytes, which is the smallest amount memory that can be allocated. Most interviewers will allow you to assume there are 8 bits in a byte, so each byte can have one of 28=256 possible values.

Some interviewers will ask you to solve problems that are explicitly about manipulating bits. For these questions, it's helpful to know how integers are represented in binary, so that you can recognize that 7 is 01112. It is also helpful to become familiar with the two's-complement representation for negative integers.

The bitwise operations (AND, OR, XOR, NOT) can be incredibly useful even when the task is not explicitly about bits. Bitwise operations are incredibly fast, and you can compress N yes/no decisions into N bits (rather than N bytes if we stored the numbers 0 and 1 for each decision).

Bitwise operations are particularly useful when doing image manipulation, or implementing the hash functions of hash tables. An advanced use of bits is often used in dynamic programming when looking at which of N objects we are going to select, as we can use an N bit integer to represent which objects were selected (if the ith bit is 1, the ith object is selected).

Table of contents

Our information about bits is broken into three parts:

Many applications of bits don't actually require you to think about them as integers. Individual bits can be used to represent a decision. For example, the subset-sum problem asks if there is a subset of a given set that add to a given number. As an example, is there a subset of { 2, 3, 4, 6, 21, 14} that sums to 12? One way of arranging the elements in an array (sets have no order!) and labeling the subsets by using the ith bit to determine if the ith element was in the subset or not. For example, the subset {2, 4, 6} (which does sum to 12) can be represented as the bit-string 101100 because

Element index

2

3

4

6

21

14

Subset bitmask

1

0

1

1

0

0

In order to use bits this way, and effectively manipulate the bits, it is important to know bitmasks, which in turn depend on bitwise operations. It is not as important to know that most programs think of 1011002 as the number 44.

We have included how integers are represented as bits because:

What you really need to know about integers is

Integer representations in different bases

Getting familiar with bases mathematically.

The most familiar representation of integers is in base 10 (i.e. using 10 digits, 0 through 9), with an optional sign in front. The sign will be treated a little bit differently when coding N bit integers. Because this is the "normal" way of representing a number, we often explicitly say that the number is written in base 10. When we see the number 101, we generally assume it is one-hundred and one. If we want to be explicit, we can write the base as a subscript, such as 10110.

One-hundred and one can be written as

10110 = 1 · 100 + 0 · 10 + 1 · 1 = 1 · 102 + 0 · 101 + 1 · 100

We can summarize this somewhat systematically in a table:

power

2

1

0

10power

100

10

1

Digits

1

0

1

Contribution

+100

0

+1

Result

10110

If we wanted to represent one-hundred and one in powers of 8, for example, we would look at the powers of 8 that are less than 101, namely 1, 8, and 64. Each digit can take 8 values (0 through 7), so we know a number written in base 8 will never have an 8 or 9 in it.

10110 = 1 · 64 + 4 · 8 + 5 · 1 = 1 · 82 + 4 · 81 + 5 · 80 = 1458

In table form:

power

2

1

0

8power

64

8

1

Digits

1

4

5

Contribution

+64

+32

+5

Result

10110

N-bit unsigned integers

When writing an unsigned number in binary, or base 2, we are expanding it in powers of 2. The ith bit, counted from the right, tells us about how many copies of 2i we would need when expanding our number in powers of two. As an example, to "decode" an 8-bit number such as 011001112

power

7

6

5

4

3

2

1

0

2power

128

64

32

16

8

4

2

1

Binary digits

0

1

1

0

0

1

1

1

Contribution

0

+64

+32

0

0

+4

+2

+1

Result

10310

Because 64 + 32 + 4 + 2 + 1 = 103, the number 011001012 = 10310.

If the number of bits is fixed (as it for the short and long datatypes in C/C++/Java) at N then we can only represent 2Ndifferent values. The smallest number that can be represented is 0 (where all bits are zero), and the highest number that can be represented is 2N - 1 (where all the bits are set to one).

N-bit signed integers and 2's-complement

An "obvious" way of representing negative numbers would be to have the leftmost bit be 0 for a positive number and 1for a negative number, much like the ± signs used in math. This encoding isn't widely used because it gives the number 0two encodings. Demonstrating with 4-bit integers, 00002 and 10002 would both represent zero!

The standard encoding method is called the two's complement, which works in the following way:

For example, here are all 3-bit signed integers:

Bits

Expansion

#

Bits

Expansion

#

000

0·(-22) + 0 · 21 + 0 · 20

010

100

1·(-22) + 0 · 21 + 0 · 20

-410

001

0·(-22) + 0 · 21 + 1 · 20

110

101

1·(-22) + 0 · 21 + 1 · 20

-310

010

0·(-22) + 1 · 21 + 0 · 20

210

110

1·(-22) + 1 · 21 + 0 · 20

-210

011

0·(-22) + 1 · 21 + 1 · 20

310

111

1·(-22) + 1 · 21 + 1 · 20

-110

With 3 bits, we can represent the 23 = 8 values from -4 to 3. An N bit signed integer can represent the values from -2N-1 to +2N-1 - 1.

Bitwise operations

The binary bit function is any function that takes two bits, and returns a single bit. It is common to refer to a bit being 1 as True, and a bit being 0 as False. If x and y are individual bits, then here are the most common binary bit functions:

Table of bitwise operations

x

y

OR(x,y)

AND(x,y)

XOR(x,y)

0

0

0

0

0

0

1

1

0

1

1

0

1

0

1

1

1

1

1

0

Most variables we deal with have multiple bits. For convenience, we label the ith bit of a variable X as Xi. Then we have the following bitwise operations between two N bit variables X and Y:

Given a bit function, we can always define the corresponding bitwise operation that applies the bit function bit-by-bit. As an example, let's apply each of these bitwise operations to the 8 bit unsigned ints 01101110 → 11010 and  11000110 → 19810:


 01101110
& 11000110
==========
 01000110

* Example of bitwise OR: `01101110 | 11000110 = 11101110` or `110 | 198 = 238`


 01101110
| 11000110
==========
 11101110

* Example of bitwise XOR: `01101110 ^ 11000110 = 10101000` or `110 ^ 198 = 168`


 01101110
^ 11000110
==========
 10101000

In addition to the binary bitwise operations &, |, and ^, there is the bitwise NOT operator (usually referred to as ~ in programming languages). As the name suggests, every bit in ~X is different from the corresponding bit in X.

Bit shifting

Our last operations aren't really bitwise operations (they don't perform operations bit-by-bit), but they are important enough to be included anyway. They are:

Bitmasks

There are many different interpretations possible for what each bit represents. In the introduction, we used the ith bit represents whether the ith element was in the subset for the subset-sum problem.

For bitmasks we'll make an analogy to a switch, where if bit i is 1 we say the ith switch is on, and when bit i is 0 we say the ith switch is off. The 0th bit will be at the right end, and we will count back toward the front. This has the nice property that the ith bit is the coefficient of 2i.

Getting the ith bit / Checking the state of the ith switch

We can find the value of the ith bit of n by using (n & (1 << i)) != 0. Can you see why it works?

We use the following properties of the AND function:

The number 1 << i is the number with 0 everywhere, except for a one in the ith position. Therefore n & (1 << i)ensures all bits except the ith bit is zero. Looking at an example for finding the 4th or 3rd bit:

  11010001 n & 00010000 (1<<4) ==========   00010000 (n & ( 1 << 4))

  11010001 n & 00001000 (1<<3) ==========   00000000 (n & ( 1 << 3))

If the `i`th bit is zero, then the output is zero. Checking if the output is non-zero will tell us the value of the `i`th bit.

Setting the ith bit / Turning the ith switch on

We can get a number that is n with the ith position changed to 1 with the code n | ( 1 << i ). Can you see why it works?

We are using the following properties to the OR function:

Looking at two examples:

  11010001 n | 00010000 (1<<4) ==========   11010001 (n | ( 1 << 4))

  11010001 n | 00001000 (1<<3) ==========   11011001 (n | ( 1 << 3))

Since every bit except the ith bit of n is OR-ed with zero, they are all copied into the new variable.

Clearing the ith bit of n/ Turning the ith switch off.

We can force a bit to zero using AND(0,x) = 0. We can preserve the other bits by using AND(1,x) = x. So if we can make a number mask that has all of its bits as 1 except the ith bit, we want to return n & mask.

To generate mask, note that we know how to set only the ith bit with (1 << i). We want the bitwise NOT of this: mask = ~(1 << i).

So our final result is n & (~ (1 << i)).

Clearing the i least significant bits of n/ Turning the first i switches off.

This is interesting as it shows how to set multiple values at once. We are interested in switching off the i rightmost (i.e. lowest power of 2) bits to zero. Forcing bits to zero suggests using AND. Ideally we would make a mask = 1111..100000where the rightmost i bits were zero, and all the others were one. Then our answer would be n & mask.

How do we create mask? If we start with a variable that is all 1s, then we can left shift it i times to get the i rightmost bits set to 0. The number that has all of its bits set to one for a signed int is -1. So mask = (-1 << i).

Our final result is n & (-1 << i).

Toggle the ith bit of n / toggling the ith switch

XOR is great here. Like OR, XOR(0,x) = x so we can use a mask with zero bits to preserve the bits we want to leave alone. We also have XOR(1,x) = NOT(x)! In fact, we can use this toggle property to eliminate the ~ operator altogether, because ~X is the same as X ^ (-1).

To toggle the ith bit of n, we would return n ^ (1 << i).

Updating the ith bit of n to x.

Let's say that we know x is 1 or 0. How would we use it to set the ith bit to x, given that we used AND to clear the bit (set it to 0) but OR to set the bit (set it to 1)?

Here is one approach for what to return (n & (~(1 << i))) | (x << i). What this does is force the ith bit to zero, then ORs it with 0 (if x is zero) or ORs it with a number with only the ith bit set, forcing that bit to 1 if x is 1.

Final comments on bits in Interviews

This is one of the longer overviews! There is a lot of background detail, and bit problems are generally very compact. The hard part isn't the length of the code that you're writing, but becoming comfortable thinking in bits, knowing some standard tricks, and getting a feel for when to use OR (forcing a bit to 1), AND (forcing a bit to 0), or XOR (forcing a bit to toggle).

Common Techniques: Advanced

Interview Essentials

There are some common approaches to problems that, when you use them, can dramatically improve your programs. These techniques are more advanced, because it takes practice to be able to identify problems where these approaches work well. Two tricky approaches that apply to a broad range of problems are:

This section also looks at problems whose solutions utilize disjoint sets, which allow us to group a collection of objects into non-overlapping sets. This fast and powerful data structure enables us to quickly label objects as belonging togethe

RegEx

Interview Essentials

A regex is a string that encodes a pattern to be searched for (and perhaps replaced). Interviewers rarely test your ability to write a complicated Regex for regex sake, but they can be used in other interview tasks such as:

Proficiency with regex will enable you to use tools like awk and sed to quickly massage files into the desired format, use tools like grep to search for information more quickly, and enable you to perform bulk renaming of files. They also make the "search" feature of most text editors far more useful.

Writing a good regex is tough - you have to match everything you want, and nothing that you don't. If you don't feel confident with regex tricks, only try using very simple regexes during technical interviews to avoid getting flustered.

Regex is generally used interchangeably with the term regular expression, but there is a technical definition of regular expression that means the strings it searches for has to be identifiable with a finite state machine (i.e. it works on so-called regular languages). Most regexes in practical use are more general than the strict definition of a regular expression.

A regex engine interprets your regex string, and processes the other string to see if it is a match. Each programming language has a standard regex engine, and there is a slight variation amongst supported features and syntax, at least when using ASCII character sets.

In what follows, the regex strings will not be enclosed by double quotation marks. Even though the regexes are strings, there are different (non-portable) ways of expressing the strings to minimize the number of character escapes that need to be performed. For example, in Python it is standard to use "raw strings" to make a regex, so we would search for a digit using r'\d'. In this tutorial, we'll denote the same regex as \d.

We can broadly break the pieces of a regex up into:

The syntax for modifiers varies the most, so we won't discuss them here. A regex doesn't need to specify all of the different attributes. For example, if we want to find the string "CodeFights" anywhere in our string, the regex CodeFights would work well. If we wanted to match only strings that started with CodeFights, we would use ^CodeFights, where ^ plays the role of an anchor. By default, if no anchor is specified, most regex engines will look anywhere in the string for a match.

Character classes

Some general rules:

Character class

What it matches

a

The literal letter a (as an example)

[^a]

Everything except a. The square brackets here are important. ^a would match strings that beginwith a!

[a-z]

Any lowercase letter a - z

[^a-e]

Everything except the letters a-e

[a-zA-M]

Any lowercase letter, or uppercase letters up to and including M

\d

Matches any digit. Shorthand for [0-9]

\D

Matches any non-digit. Shorthand for [^\d]

\w

Matches any "word" character. This is the same as [a-zA-Z\d_]

\W

Matches any character except "word" characters.

\s

Matches any whitespace character (space, tab, newline, carriage return, and vertical tab)

\S

Matches any non-whitespace character.

.

Matches any character (except newlines, in most regex engines)

\1

Matches the first "captured group"

Quantifiers

Often we are interested in matching more than one character. For example, if we are trying to match car license plates of the format "3 upper case letters, followed by 4 digits", we could use the ugly regex [A-Z][A-Z][A-Z]\d\d\d\d (but please see the footnote after the table!). A quantifier helps us with repeated regexes, so we could write this as [A-Z]{3}\d{4}instead. Quantifiers will also allow ranges, so if we knew that the plates we were looking for had 2 or 3 upper case letters, and between 1 and 4 digits, we could write the succinct [A-Z]{2,3}\d{1,4}.

Quantifier

What it does

Example

X{n}

Matches n copies of X

\d{4} matches 4 consecutive digits

X{a,b}

Matches between acopies of X and bcopies of X

\d{2,4} matches any string with 2 to 4 consecutive digits

X{a,}

Matches a string that has a or more copies of X

[a-zA-Z]{5,} matches "words" (strings of letters, not including digits) that are at least 5 characters long.

X*

Matches 0 or more copies of X. Same as X{0,}.

.* matches anything (except newlines) as many times as needed. Chocolate.*cookies matches Chocolate cookies, Chocolate chip cookies, and Chocolate-Pecan cookies. Note in this case, Chocolate.*?cookies is probably a better regex, which is explained in "Greedy vs non-greedy matches".

X+

Matches 1 or more copies of X. Same as X{1,}.

a+ matches at least one a, so aaa would match, bac would match, but bbc would not.

X?

Matches 0 or 1 copies of X. Same as X{0,1}. Can be thought of as making X optional

colou?r matches color and colour

Warnings on upper limits

The upper limit on characters can lead to unexpected results. For our license plate example, would you expect the regex [A-Z]{3}\d{2,4} to match the string "JKF224456"? Your first thought might be that it doesn't match because there are more than 4 digits after the three capital letters. It does match, which we can show by underling the match: "JFK224456". The same regex also matches "ABCDEFG1234576". The problem with our match is that we haven't restricted what can appear before and after the four digits. (If you are wondering why the match is to "EFG1234" rather than just matching "EFG12", it is because regexes are greedy by default).

A solution you might try is to make sure whatever appears before the three capital letters isn't a capital letters, and whatever follows after the digits isn't a digit. The regex is [^A-Z][A-Z]{3}\d{2,4}\D. Let's check it on three strings:

To solve this problem more generally we will use anchors (see the next section). The moral for this section is that upper limits on character classes can be misleading. Generally you can only trust them to match the exact number when you have specified what happens on the left and right of the part of the regex with the quantifier.

Anchors

Anchors are generally thought of as "locations" within the string.

Regex

Meaning

Example

^

Matches at the beginning of the line

^abc matches occurences of "abc" at the beginning of the string, or immediately after a newline.

\A

Matches at the beginning of the string (same as ^ if there are no newlines).

\Aabc matches "abc" if it is the beginning of the string

$

Matches at the end of a line

xyz$ matches occurences of "xyz" at the end of the string, or immediately before a newline.

\Z

Matches at the end of the string (same as $ is there are no newlines).

xyz\Z matches "xyz" at the end of the string.

\b

Word boundary. Defined by only one edge side of this position being a word character (as defined by \w)

\b[a-zA-Z]{7}\b matches "iPhone7" and "Numbers are great", but not "iPhone12".

\B

Not a word boundary.

\bman\B matches "woman" and "batman", but doesn't match "mankind" or "man".

For our license plate problem, if we wanted to match strings that were just the license plate, we could use the regex ^[A-Z]{3}\d{2,4}. If we wanted to look for strings that contained valid license plates, we could use word boundaries with the regex \b[A-Z]{3}\d{2,4}\b.

Greedy vs non-greedy matches

By default, a regex will try and make the largest match it possibly can. This is referred to as a greedy match. Consider the following string:

"I like chocolate chip cookies, chocolate pecan cookies, as well as raisin cookies"

What does the regex chocolate.*cookies match? By default, we have a greedy match, so what is the longest string that matches this pattern? The longest substring that starts with "chocolate" and ends with "cookies" is

"chocolate chip cookies, and chocolate pecan cookies, as well as raisin cookies"

The intent of this regex was probably to match all types of chocolate cookies. We can modify the regex using the ?modifier after *. The regex chocolate.*?cookies makes smallest possible matches, so we end up matching the underlined portions of the string:

"I like chocolate chip cookies, chocolate pecan cookies, as well as raisin cookies"

Look-arounds

Usually when a regex makes a match (or an exclusion), later parts of the regex only reference later parts of the string. The purpose of a look-ahead regex is to determine if the rest of the string matches the regex, and then returns to the original position. A look-behind regex checks if positions prior to the current position match, without changing the current position.

There are 4 types of look-arounds:

Examples

Regex

What it does

\b[A-Z]\w*\b

Matches capitalized words, where a word consists of letters, digits and underscores.

\b(\w+)\s+\1\b

Matches two repeated words, such as "In the the cupboard" (useful for finding typos)

((?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[@#\$\!\^].{6,20}))

Matches strings that are between 6 to 20 characters, and contain at least one digit, one lowercase letter, one uppercase letter, and one of the special characters "@#$!^". Useful as a password validator.

/\b\d{2}(\d{2})\b/$1/g

Replaces 4 digit numbers with the last two digits (e.g. 2001 -> 01). Doesn't replace numbers joined to words (e.g. ComicCon2012 would remain unchanged). PERL style replacement.

(["'])[^\1]?CodeFights[^\1]?\1

Matches quotes about CodeFights

(i.e. CodeFights contained by either matching single or double quotation marks).

[\(]?\d{3}[\)]?[\s-]*\d{3}[\s-]?\d{4}

Matches 10 digit phone numbers, with optional brackets around the area code and optional spaces and dashes between the groups.

Number Theory

Interview Essentials

Number theory is the theory of integers. Knowing key results from number theory can help you write efficient algorithms. Number theory is has many somewhat esoteric facts that are simple once you see them, but are difficult to discover "on the spot". Fortunately, interviewers only expect you to have experience with a small subset of number theory.

You should practice calculating series (particularly the Fibonacci series), using modular arithmetic, and finding the greatest common divisor between numbers. You should also know a technique for generating prime numbers, such as the Sieve of Eratosthenes.

Your interviewer is probably happy to provide you with definitions of unfamiliar concepts in an interview, but being familiar with the topics above will help you understand your task much more rapidly than you would if you were going in with just the definitions.

Unless you're interviewing at a company that implements encryption, you're unlikely to see pure number theory problems in technical interviews. Instead, you can use results from number theory to help you solve your main problem.

Non-recursive series

Some of the standard series results allow you to figure out the sum of n objects in O(1) instead of O(n). Some of the more commonly occurring series are:

\sum_{i=1}^n i = \frac{n(n+1)}{2}i=1ni=2​​n(n+1)

\sum_{i=1}^n i^2 = \frac{(2n + 1)(n+1)n}{6}i=1ni2​=6​​(2n+1)(n+1)n

\sum_{i=1}^n i^3 = \frac{n^2(n+1)^2}{4}i=1ni3​=4​​n2​(n+1)2

\sum_{i=1}^n r^{i-1} = \frac{1-r^n}{1-r}i=1nri−1​=1−r​​1−rn

Primes

You should be able to:

The simplest algorithm for finding primes is the Sieve of Eratosthenes. It is often introduced as a way of finding all the elements that are less than or equal to N in the following way:

  1. Write down all the numbers from 2 ... N.
  2. Remove the smallest number in the array and store it as a prime, and then "cross out"/remove all higher multiples of that number up to N.
  3. If there are numbers that have not been removed (either because they are primes, or multiples of primes), go back to step 2.

There are better ways of implementing it than by directly implementing the algorithm above. The main drawbacks of a direct implementation are that you have to allocate space for the integers 2 through N, and that you have to know where you want to stop.

An iterative implementation of the Sieve is:

  1. Start with primes = [2,3], and candidate = 5
  2. Check if candidate is divisible by anything in the list primes:
  1. If we still need more primes, then increase candidate by 2 (i.e. to the next odd number) and go back to step 2. Typical stopping conditions are:

The advantages over the naïve implementation are that you don't need to keep all the integers in your Sieve in memory, and it becomes easier to implement a custom stopping condition. For example, you could see how you could use this method of generating primes to generate primes until you found a prime with the consecutive digits 314 (the first three digits in π) appeared.

(In case you're curious, the first prime with 314 to appear in it — in base 10 — is 13147.)

Finding the greatest common divisor (Euclid's algorithm)

A number d is a divisor of n if n / d is an integer. Given two numbers m and n, the greatest common divisorgcd(m,n) is the largest number that is a divisor of both m and n. The GCD doesn't exist if m and n are both zero.

Note: Many number theory references won't assume a notion of division, because it is possible for n / d to be a non-integer even when n and d are both integers. So don't be surprised if you see the equivalent definition that "d is a divisor of n if there is some other integer q such that d*q == n".

To find the GCD of m and n, it helps to know that for any pair of positive integers m and n, we can write:

 m = q n + r,   q an integer,    0 ≤ r < n

Here q is called the quotient (the "number of times n divides m") and r is the remainder. In many computer languages that do integer division, we have:

q = m / n (integer division) = floor(m/n) (math division)

r = m % n (modulus operation)

Recast in this language, a number d divides n if and only if n % d == 0.

The key insight to finding the GCD of m and n is that if a number d divides m and n, then it must also divide the remainder because

 m % d = q (n % d) + (r % d) ==> 0 = q(0) + (r%d) ==> (r%d) == 0

Our goal is to find the largest such d. A couple of observations help:

  1. If r is zero, then n divides m. Therefore gcd(n,m) = n, since the greatest divisor of any positive integer n is the integer itself.
  2. If r isn't zero, then we can look for the greatest common divisor between n and r. Since this greatest divisor can be factored out, it is also a divisor of m.

Taken together, these observations imply:

gcd(m,n) = gcd(n, r),   where r = m % n

The first case is true because gcd(n,0) = n whenever n is a positive integer. We have to use this as a base case because we are writing an algorithm that works on positive integers.

This gives a recursive method for finding the GCD of two integers:

def gcd(m,n):
 n, m = abs(n), abs(m)
 
if n + m == 0:
   
return float('NaN')

 
# base case: if m or n are zero, the other one
 
# is the gcd
 
if n * m == 0:
   
return n + m

 
return gcd(n, m % n)

This algorithm is known as Euclid's algorithm, and it can be shown to make at most 5 log10(min(m,n)) + 1 recursive calls.

An important special case is when gcd(a,b) = 1, in which case the numbers a and b are called coprime. They have no factors in common except 1, and every integer has 1 as a factor.

Modular arithmetic

Modular arithmetic looks at addition and multiplication of integers that "wrap around" a certain value called the modulus. This is sometimes called "clock arithmetic" because it is similar to the way that clocks work. For example, if it is now 20:00 (i.e. 8 PM) and we wait 6 hours, then it is 02:00 (i.e. 2 AM) even though 20 + 6 = 26. Twenty-four hour time "wraps around" at 24 hours.

Technically, modular arithmetic deals with equivalence classes of integers. We call two numbers "congruent modulo m" if they have the same remainder after division by m. Using our example above, 26 and 2 are congruent modulo 24, because they both have remainder 2 after division by 24.

From a pragmatic (and interview-centric) point of view, we can think of integer arithmetic as "remainder arithmetic". That is, we are considering the addition and multiplication of remainders. The remainders ri ∈ {0,1,2,3,...,m-1}.

If you use a language that has fixed size unsigned ints, then technically all integer arithmetic being performed with this native data type is modular arithmetic. (You can always build a BigInt class, or use a library if you need arbitrarily sized integers.) The signed ints also wrap around and would be examples of modular arithmetic, but the negative numbers mean we couldn't consider this the "arithmetic of remainders". One of the advantages offered by modular arithmetic is that you have an upper limit on the size of the integers you have to process, so particular operations are fast.

For addition, subtraction, and multiplication mod m, we can use the "normal" operation, and then take the modulus at the end. We can also just deal with the remainder of the inputs, and get the same value:

Operation mod m

Normal operation, then mod m

Mod inputs

Add a and b mod m

(a + b) % m

( (a % m) + (b % m) ) % m

Subtract a and b mod m

(a - b) % m

( (a % m) - (b % m) ) % m

Multiply a and b mod m

(a * b) % m

( (a % m) * (b % m) ) % m

If a and b are much larger than m, then the second column in this chart can be much more efficient since you only perform the arithmetic operation on numbers smaller than m. If you already know that a and b are numbers between 0 and m-1 (inclusive), the middle row is more efficient.

There are a few counter-intuitive properties of modular arithmetic. Here are two of the more common ones. For the following, I will assume that 0 ≤ a,b < m (i.e. a and b have already been converted into "remainders").

  1. What we can say is that if (a*b) % m == 0, then a*b is a multiple of m, so a or b must share factors with m. If all the factors are in a, then a must be zero (we're dealing with the remainders here). Similar comments apply to b.
  2. If a and m are coprime (i.e. gcd(a,m) = 1 so we know a has no non-trivial common factors in common with m) then if (a*b) % m == 0 we know that b = 0. In other words, b is a multiple of m, which means b = 0 if b has been converted to a remainder.
  1. (3*a) % 6 = 0 which has a solution a = 2 as well as a solution a = 0. The technique used above (do the regular operation and then take the modulus) misses the solution a = 0. Treating this like a regular equation, 3*a = 0, and rearranging to get a = (0/3) % 6 = 0 misses the solution a = 2.
  2. (3*a) % 6 = 1 has no solutions. For integer a, 3*a is always a multiple of 3. Any multiple of 3 is either divisible by 6, or has a remainder of 3 after dividing 6; neither case will leave a remainder of 1.
  3. (3*a) % 6 = 3 has multiple solutions (namely a = 1, a = 3, a = 5).

To replace division, we look for multiplicative inverses. A number a-1 is the multiplicative inverse of a (modulo m) if ( a-1 * a) % m = 1. We can replace the idea of dividing by a with the notion of multiplying by a-1, if it exists. The multiplicative inverse of a modulo m exists if and only if gcd(a,m) = 1. If there is a multiplicative inverse of a, you can be guaranteed that (a * x) % m = b has a unique solution for x ∈ {0, 1, ..., m-1}.

In particular, if working with modulo p for any prime p, we are guaranteed that every non-zero number 1,2, ...,p-1 has a multiplicative inverse modulo p.

Recursive sequence (with and without modular arithmetic)

A recursive sequence has two types of terms:

One of the most well-known recursive sequences are the Fibonacci numbers Fi, where:

 Fi = Fi-1 + Fi-2        (i > 1),          F0 = 0,    F1 = 1

The first two terms are the base case. From the base case, we can use the recursive step to find the other numbers in the sequence.

A related sequence, called the Lucas numbers Li are defined by the following relation:

 Li = Li-1 + Li-2        (i > 1),          L0 = 2,    L1 = 1

In other words, the recursive step is exactly the same, and only the base cases have changed.

While it's generally easy to write a recursive function directly from the definition, they often have poor runtimes. A recursively defined sequence where the anth term depends on a constant number j of previous terms has a runtime of O(jn). (For the Fibonacci and Lucas sequences, j = 2.) Using memoization, or an iterative approach, often works better. It can generate the term an in O(n), assuming that the operations such as multiplication and addition take constant time (which isn't technically true and can become a major issue as the numbers become large).

Recursively defined sequences mod m

If a recursively defined sequence is defined in terms of the j previous terms, then when we have a series of jconsecutive terms repeat, the sequence has to repeat since the recursive step would always put out the same numbers.

For a specific case, such as the Fibonacci numbers, if we have distinct i and j such that:

  1. Fi = Fj
  2. Fi+1 = Fj+1
  3. the recursive condition would guarantee that Fi+2 = Fj+2, which in turn guarantees that Fi+3 = Fj+3, and .... i.e., Fi+k = Fj+k for k ≥ 0.

This doesn't ever happen for the Fibonacci sequence, since the sequence is strictly increasing. However, if we look at the Fibonacci sequence modulo m, then for a consecutive pair of terms there are only m2 possible pairs (m possibilities for the first number, and m for the second number). After at most m2 terms, the Fibonacci sequence mod m has to repeat. If a series ai that outputs integers depends on the previous j values, the series ai % m will have to repeat after at most mj terms.

Linear Diophantine, or integer, equations

Often we are interested in whether we can satisfy an equation with multiple variables using only integer quantities. An example of this sort of question is the "McNuggets question": If we are able to buy McNuggets in 4 packs, 6 packs, and 9 packs, am I able to buy exactly 41 McNuggets?

Assuming we only allow purchases of whole packs (and these are the only types available) we are asking if it is possible to solve

4 X + 6 Y + 9 Z = 41

where X represents the numbers of 4 packs, Y represents the numbers of 6 packs, and Z represents the numbers 9 packs. The answer is "yes", as X = 2, Y = 1 and Z = 3 works (as do other combinations).

Compare this to the seemingly similar

3 X + 6 Y + 9 Z = 41

which doesn't have any solutions, as the left-hand side is a multiple of three for any integer choices X,Y,Z, while the right-hand side isn't.

If we're just concerned with whether a solution (x_1,x_2,\ldots x_N)(x1​,x2​,…xN​) exists for a problem of the form

A_1 x_1 + A_2 x_2 + \ldots + A_N x_n = CA1x1​+A2x2​+…+ANxn​=C

for a known A_iAi, then there is a simple test. There are solutions if and only if

\text{gcd}(A_1, A_2, A_3, \ldots A_N) \text{ divides } Cgcd(A1​,A2​,A3​,…AN​) divides C

Showing that this condition is necessary is easy. In the 3X + 6Y + 9Z case, the left-hand side is a multiple of the GCD of all the coefficients Ai, so for there to be a solution the right-hand side must be as well. It's a little trickier to show that this is also a sufficient condition.

The extended Euclidean algorithm solves this problem for two variables. Given two integers a and b, the extened Euclidean algorithm outputs x, y, and gcd(a,b) where x and y satisfy

A x + B y = gcd(a,b)

In contrast, the standard Euclidean algorithm only output gcd(a,b). To see that this solves any equation of the form AX + BY = C, we know there is only a solution if gcd(A,B) divides C. If this isn't the case, we simply report no solutions. Multiplying the equation solved by the extended Euclidean algorithm by C/gcd(A,B) shows that X = C x /gcd(A,B) and Y = C y/gcd(A,B) are solutions to A X + B Y = C.

The recursive algorithm for the extended Euclidean algorithm can be made by first identifying a base case. Since aa and bbappear symmetrically in ax + by = gcd(a,b)ax+by=gcd(a,b), suppose bb is the smaller of the two, and that a >0a>0, b \geq 0b≥0. We have the base case:

If we don't have the base case, we need to coerce the second argument to zero.

We can substitute b for a in our original equation:

a x + b y = gcd(a,b) \Rightarrow (qb + r) x + by = gcd(a,b)ax+by=gcd(a,b)⇒(qb+r)x+by=gcd(a,b)

Rearranging slightly:

b\underbrace{(q x + y)}_X + r \underbrace{x}_Y = gcd(a,b) = gcd(b,r)

That is, our equation is bX + rY = gcd(b,r)bX+rY=gcd(b,r), which we can solve with a call to the extended Euclidean algorithm. We have replaced aa with bb (and we picked bb to be the smaller coefficient), and we replaced bb with rr (where the remainder is guaranteed to be less that bb), so the second argument is getting smaller. There can be at most |r|∣r more calls until the second argument reaches 0, getting to the base case. Once we have (X,Y)(X,Y), we can reconstruct the original (x,y) = (Y, X - qY)(x,y)=(Y,XqY).

Here is a recursive implementation of the extended Euclidean algorithm:

def extended_gcd(a,b):
 
" Returns (gcd(a,b), x, y) as solution to ax + by = gcd(a,b)"
 
# error: a and b are both zero, gcd not defined
 
if abs(a) + abs(b) == 0:
   
return (float('NaN'),float('NaN'),float('Nan'))
 
# base case
 
if b == 0:
   
return (a,1,0)

 
# recurse, sending second argument to zero first
 
# because of the way base case is arranged
 gcd_ab, x, y = extended_gcd(b, a % b)
 
return (gcd_ab, y, x - (a/b)*y)

Note that there are a few details from our original motivation that didn't make it into the code. For example, we never explicitly check that b is smaller than a. Once the first recursive call extended_gcd(b, a%b) has been made, the second argument is smaller than the first. Once a solution (x,y)(x,y) has been found, other solutions can be found via

x_i = x + bi, \quad\quad\quad y_i = y - aixi​=x+bi,yi​=yai

With multiple variables, a "merge and conquer" approach can be taken. For example

A X + B Y + C Z = N \Rightarrow AX + gcd(B,C)(b Y + c Z) = N, \quad\quad\quad b = \frac{B}{gcd(B,C)}, \quad\quad\quad c = \frac{C}{gcd(B,C)}AX+BY+CZ=NAX+gcd(B,C)(bY+cZ)=N,b=gcd(B,C)​​B​,c=gcd(B,C)​​C

can be solved using the extended Euclidean algorithm for XX and Q = bY + cZQ=bY+cZ. Once complete, the extended Euclidean algorithm can be used again to solve bY + cZ = QbY+cZ=Q.

Counting

Interview Essentials

Many interview questions are dependent on you being able to calculate the number of something, or to calculate the number of arrangements that satisfy particular properties. Often these numbers are huge, and it isn’t feasible to generate all the solutions by brute force.

Interviewers are looking to see if you know how look at a problem and determine if the order of items in an arrangement is important, and if it isn’t. Once you have determined what to count, they will be looking for your ability to take a naive solution and optimize it using more advanced methods.

The Multiplication Principle

If there are n_1n1 ways to select item 1, and n_2n2 ways to select item 2, then there are n_1n_2n1n2 ways to select pairs of items (if the order matters).

To convince yourself of this, select a particular item for item 1. Then there are n_2n2 choices for item 2. A different choice of item 1 will still have n_2n2 choices for item 2. Since there are n_1n1 different choices for item 1, and each choice contributes n_2n2new pairs, there are n_1n_2n1n2 distinct pairs.

As an example, suppose we had to choose a letter from \{A,B,C,D\}{A,B,C,D} (item 1) and a number from \{1,2,3\}{1,2,3}. Since there are 3 possible numbers for each letter picked, there are twelve possible pairs:

Letter

Number

A

1

A

2

A

3

B

1

B

2

B

3

C

1

C

2

C

3

D

1

D

2

D

3

Important: The multiplication principle only holds if the number of choices for each item doesn't depend on the item chosen. Here are two examples:

  1. Pick a letter from \{A,B,C\}{A,B,C} and a number from \{1,2,3\}{1,2,3} and the number cannot be the 1-based index of the letter in the alphabet (i.e. A1, B2, C3 are excluded).
  2. The multiplication rule does hold, because regardless of which of the 3 letters we pick, there are 2 numbers we can use. There are 3 \times 2 = 63×2=6 possible combinations.
  3. Pick a letter from \{A,B,C,D\}{A,B,C,D} and a number from {1,2,3} and the number cannot be the 1-based index of the letter in the alphabet (i.e. A1, B2, C3 are excluded).
  4. The multiplication rule doesn't hold, because the if we pick 'A', 'B' or 'C' there are 2 numbers we can choose, but if we pick 'D' then there are 3 numbers we can pick from.

The Addition Principle

If we are counting items that are in non-overlapping classes, the total number of items is just the sum of items in each class. A simple example: If the only places to eat in town are 3 fancy restaurants and 2 fast food joints, then there are 3+2 = 53+2=5 total places to eat (assuming that no fast food joint qualifies as "fancy").

If there are overlaps, then we need to subtract all the items that are double-counted. For example, if we have 8 pasta dishes, and 4 spaghetti dishes, then we only have 8 dishes, not 12, since every spaghetti dish is also a pasta dish. Our addition method of counting for two classes is:

\text{(Things in class 1 or 2)} = \text{(Things in class 1)} + \text{(Things in class 2)} - \text{(Things in both class 1 and 2)}(Things in class 1 or 2)=(Things in class 1)+(Things in class 2)−(Things in both class 1 and 2)

If the classes don't overlap, we get simple addition. In the case of more than two classes with overlaps, the more complicated inclusion-exclusion principle tells us how to systematically count.

The addition principle can be used alongside the multiplication principle. Taking our previous example of picking a letter from \{A,B,C,D\}{A,B,C,D} and a number from {1,2,3} with the restriction that we cannot pair a letter with the 1-based indexed letter of the alphabet, we could approach the problem in the following way:

The Subtraction Principle

If we have NN items, and a constraint excludes nn of them, then we have N-nNn items from our original set that obey the constraint. For example, suppose we wanted to count the number of days of the week that don't contain the letter 'u' in their English names. We could either count the number of days that don't have 'u' directly to get 3 (Monday, Wednesday, Friday), or we could count the number that do have a 'u' (Tuesday, Thursday, Saturday, Sunday) and subtract that number from 7: 7 - 4 = 37−4=3.

In the "days of the week" example, both approaches are approximately the same amount of work. If we wanted to count the number of "non-February" days in a year, then we could start with 365 days in a year, and subtract the 28 days in February to get 365-28 = 337365−28=337. For leap years we have 366 - 29 = 337366−29=337 non-February days as well.

The subtraction principle is most useful when combined with the addition or multiplication principles. Going back to the example of picking a letter from \{A,B,C,D\}{A,B,C,D} and a number from \{1,2,3\}{1,2,3} such that we don't pair a letter with its 1-based index in the alphabet, we can count the following way:

Permutations

A permutation of nn objects is a way of ordering those objects.

Permutations of Distinct Objects

If they are all distinct, then there are n!n! distinct permutations. This follows directly from the multiplication principle. Imagine selecting a particular permutation in order (without replacement):

The multiplication principle tells us that there are

n(n-1)(n-2) .... (2)(1) = n!n(n−1)(n−2)....(2)(1)=n!

possible permutations.

Often we are interested in the question how many permutations there are if we select fewer than nn items. For example, there is a variant of the gambling game Lotto where we have to pick the numbers drawn in the order they were drawn. If there are 60 possible numbers, and 6 are drawn, we are interested in how many permutations there are of 6 objects drawn from 60. From the multiplication principle:

\prod_{i=55}^{60} i = 60 \times 59 \times 58 \times 57 \times 56 \times 55 = 36,045,979,200i=5560i=60×59×58×57×56×55=36,045,979,200

There is a simpler way of writing this by multiplying and dividing by 54!:

\frac{60 \times 59 \times 58 \times 57 \times 56 \times 55 \times 54!}{ 54!} = \frac{60!}{54!}54!​​60×59×58×57×56×55×54!​=54!​​60!

There is an alternative way of thinking of this formula. Imagine the permutations of all 60 objects (of which there are 60!). You only care about the first 6 objects drawn, so each way of shuffling the remaining 54 objects is unimportant to you. So each permutation of 6 objects you care about appears on the list of 60! total permutations 54! times. By dividing 60! by the 54! repeats, you get the number of permutations of the first 6 objects.

In general, the formula for counting the number of permutations for selecting rr objects from NN is:

{}^NP_r = \frac{N!}{(N - r)!} = \prod_{i=N-r}^N iNPr​=(Nr)!​​N!​=i=NrNi

Important: The factorial notation N!/(N-r)!N!/(Nr)! is tidier notation, and easier to use when trying to simplify expressions. If you are implementing a permutation function on a computer, the "product" method is preferred because there are fewer multiplications, and you are not as likely to cause an overflow error during your calculation. For example, if calculating the "trivial" problem of the number of permutations of 2 objects drawn from 60, calculating 60 \times 59 = 354060×59=3540 is not going to overflow, but calculating the mathematically equivalent 60! / 58!60!/58! requires being able to handle integers up to 8 \times 10^{81}8×1081.

Permutations of Objects with Duplicates

In interview problems, you may see problems where you are have identical objects. There may be problems such as:

The general approach to these questions is to over-count, and then determine by what factor you have over-counted. To find the number of distinct anagrams of "ANAGRAMS", note that "ANAGRAMS" is an 8 letter word. If all the letters were distinct, then there would be 8! = 40320 anagrams. Because there are 3 'A's in the word, any permutation among only the "A"s leaves the string the same. Every valid anagram has 3! ways of rearranging the "A"s, so

\text{Num distinct anagrams of "ANAGRAMS"} = \frac{8!}{3!} = 6720Num distinct anagrams of "ANAGRAMS"=3!​​8!​=6720

When selecting the 7 balls from the bag, if they were each distinct then there would be 7! = 5040 different permutations. Because we can shuffle the red balls among themselves 3! ways, and the blue balls among themselves 4! ways, the multiplication principle tells us there are 3! \times 4!3!×4! ways of rearranging the balls without changing the order in which the colors appeared. Therefore:

\text{Num distinct permutations of 3 red balls and 4 blue balls} = \frac{(3+4)!}{3!4!} = \frac{7!}{3!4!} = 35Num distinct permutations of 3 red balls and 4 blue balls=3!4!​​(3+4)!​=3!4!​​7!​=35

The general formula is if you have NN items, and each distinct item ii appears n_ini times, then:

\text{Num of permutations of all items} = \frac{N}{n_1! n_2! \cdots n_j!}Num of permutations of all items=n1​!n2​!⋯nj​!​​N

where there are jj distinct items. Note that we have N = \sum_{i=1}^j n_iN=∑i=1jni. In the context our our examples:

\frac{8!}{3!1!1!1!1!1!} = 67203!1!1!1!1!1!​​8!​=6720

\frac{7!}{3!4!} = 353!4!​​7!​=35

A more challenging permutation problem might be to ask for the number of different possible permutations of 5 balls from the collection of 4 blue balls and 3 red ones. This is tricky because we don't know out of the 5 selected balls how many are red and how many are blue. One way of approaching this problem is to list how many red balls and how many blue balls we could have to make 5 total, and then use our formula for permutations to determine how many permutations there are of each of these arrangements:

Red Balls

Blue balls

Number of distinct permutations

1

4

5!/(1! \times 4!) = 55!/(1!×4!)=5

2

3

5!/(2! \times 3!) = 105!/(2!×3!)=10

3

2

5!/(3! \times 2!) = 105!/(3!×2!)=10

Since each of these selections is non-overlapping (no arrangement with exactly 3 red balls is the same as an arrangement with exactly 2 red balls) we can apply the addition rule to get 25 different possible permutations of 5 balls.

Writing a Function for Permutations

Sometimes we actually need to generate the permutations. Even if we are only looking to count the number of permutations that satisfy some condition, we may need to look at the actual permutation to determine if it satisfies our constraints. Be warned that the number of permutations of nn distinct objects is n!n!, which grows extremely rapidly, so this is not normally a scalable solution.

It can be practical if you can write your constraints in such a way that you can determine if there are any feasible permutations by looking at a prefix of a permutation. That way you can stop iterating down a particular branch early.

For languages that have generators, we can generate all permutations in a memory friendly way. Using Python as the prototype language:

def permutation(my_list, start_index = 0):
   
if start_index == len(my_list):
       
yield my_list
   
else:
       
for index in range(start_index, len(my_list)):
           
# if my_list[:start_index] isn't viable we can break
           
# and not compute the remaining permutations
           my_list[start_index], my_list[index] = my_list[index], my_list[start_index]
           
for i in permutation(my_list, start_index + 1):
               
yield i[:]
           my_list[start_index], my_list[index] = my_list[index], my_list[start_index]

The nice feature of this approach is that the prefix is the list up to and including start_index, which allows us to prune branches and "exit early" if we can determine that it is impossible to construct permutations that satisfy our constraints.

(If you are using Python, rather than just a generator-friendly language, and are allowed to use whatever external libraries you like, you should definitely use itertools.)

If you are using a language without generators, then it is probably easier to use an accumulator pattern, where the permutations are stored in results.

def permutation(my_list, results, start_index = 0):
   
if start_index == len(my_list):
       
return my_list[:]
   
else:
       
for index in range(start_index, len(my_list)):
           
# again, we can test the prefixes here to return from the
           
# function if we can determine the permutations with the current
           
# prefixes are not viable with our constraint.
           my_list[start_index], my_list[index] = my_list[index], my_list[start_index]
           new_perm = permutation(my_list, results, start_index +
1)
           
if new_perm:
               results.append(new_perm)
           my_list[start_index], my_list[index] = my_list[index], my_list[start_index]

Combinations

Combinations are arrangements of items where what matters is what is selected, not the order in which they are selected. For example, when selecting optional extras for your car, it doesn't matter which order you select them in. If you are picking up fruit for a fruit salad, it doesn't matter which order you selected the fruit.

Sometimes emotional aspects of decisions can cloud whether or not order matters when counting arrangements. For example, when counting the number of possible soccer teams we can make, we would be interested in which group of 7 people are all playing together, not the order they were picked. However, when actually making the soccer team, we are used to the order being important (the better players usually get selected first). When counting the number of possibleteams, order doesn't matter. When actually making the teams on the field, order does matter!

With permutations, we talked about the number of permutations of nn distinct objects (of which there are n!) and the number of permutations where we select rr of nn objects (of which there are {}^nP_r = n!/(n-r)!nPr​=n!/(nr)!). For combinations, there is only one combination of nn objects (simply take all of them). The more interesting question is how many combinations are there where we take rr objects from nn. There are two notations for this: {}^nC_rnCr and {n\choose r}(rn​). With our knowledge of permutations, we can find the formula for the number of combinations easily:

\text{Number of ways of picking r out of n distinct objects} = {n \choose r} = \frac{n!}{(n-r)! r!}Number of ways of picking r out of n distinct objects=(rn​)=(nr)!r!​​n!

We see that it is the number of permutations we could make (n! / (n-r)!n!/(nr)!) divided by the number of different ways we could have arranged those rr objects. This makes sense because each of those r!r! different permutations of the rr objects is still the same combination, so every combination has been over-counted r!r! times.

Probability, Combinations vs Permutations

When there are identical objects, it becomes a lot trickier to give a general prescription for counting objects. For example, when selecting the 5 balls from 3 red balls and 4 blue balls, there were 3 combinations:

Each of these different combinations had a different number of permutations.

When asked probability questions, it is usually assumed that each permutation is equally likely, but that we are interested in the probability of a particular combination. If we took our bag of 3 red balls and 4 blue balls, and asked for the probability that a "random draw" of 5 balls would have 1 red ball, and 4 blue balls the expected answer would be 1/5, because 5 of the 25 possible permutations make this combination.

A more extreme example is looking at the number of heads you get when flipping 100 coins. There are 101 different combinations (0 heads, all the way up to 100 heads), but 2^{100}2100 different permutations. Assuming the coin is fair, the chance of a particular total number of heads coming up is equal to the fraction of permutations that have that many heads.

What to Expect for an Interview

You should be comfortable writing your own permutation and combination functions. Even outside the realm of counting, it is one of the more challenging recursive problems you can expect to see on an interview. Most places you interview will want you to have some idea about the difference between the ideas of combinations and permutations, but only companies whose work is highly mathematical or algorithm-driven will expect you to have memorized the formula for counting the number of permutations and combinations. It's in the Extra Credit plan of Interview Practice for a reason!

Here are the techniques related to generating combinations and permutations that you should know:

Geometry

Interview Essentials

In technical interviews, geometry problems act as both general knowledge questions (do you remember what a scalene triangle is?) and as a source for challenging coding questions. From the interviewer's perspective, geometry questions are useful because they don't require much explaining or many data structures beyond "points" and "lines".

An interviewer is interested in how effectively you use your knowledge about the problem to eliminate solutions. When looking for the intersection of multiple lines, how do you deal with the approximate nature of floating point arithmetic - or do you start with a more robust approach? When looking at a collection of N points to see if four of them can make a rectangle, do you iterate through all the different ways of connecting each set of four, or do you eliminate possibilities using the distances between the points first? Do you check the corner cases (no pun intended)?

Crucial Terms

Interviewers are usually happy to explain geometric terms or concepts that you’re not familiar with, but in general you should be very familiar with the following terms.

Important Concepts

As with any interview question type, you should always be thinking about optimizing the time and space complexity of your solution. For instance, if you’re trying to determine whether you can connect 4 points out of a collection of N points to create a parallelogram, a naive solution might be to iterate through all possible ways of connecting the points, for a time complexity of O(N2), where N2 is the number of possible combinations. Then sort the edge lengths, looking for two pairs of edges of the same length, and where the 4 edge lengths are between only four points. A better solution would be to eliminate some combinations right away using the distances between the points.

For An Interview, Expect To See Problems Like...