When I look at computers I would at least have some intuition how it all works in theory. There was bunch of theories we had to learn in university to reason with the mathematical models of it's computation. However, I had no idea theoretically how the quantum computers are supposed to be faster. So I went down the rabbit hole and I want to share what I have learnt so far before I forget it all.
The intuition
Most explanations on how quantum computers work are usually presented from a physicist's perspective. We often hear that quantum computers have qbits which can be both 0 and 1 at the same time, called a superposition. And when we measure it using a filter, it collapses to either 0 or 1.
Based on that explanation itself, I got the wrong idea that quantum computers can represent 3 distinct states any given time. But this doesn't really explain how it would outperform classical computers by a substantial margin. In complexity theory terms, the ability to represent n bits in $2^n$ vs $3^n$ isn't that significant.
The important piece of information that's often missing is that each qbits has some probability to be 0 and some probability to be 1 at the same time -- in other words a qbit can be both 0 and 1 with some probability to collapse to either one state when we measure it. We exploit this quantum property in our favor.
Trying to explain how quantum computers work with metaphors and analogies (like in pop science articles) makes it hard for us grasp the intuition behind quantum computers. I think our natural language is not equipped to deal with the level of intricacy required in the quantum world. So let's start with something more formal.
Single cbit vs Single qbit
Let's consider a classical bit (cbit) with the value 0 which can be written as $\begin{pmatrix} 1 \\ 0\end{pmatrix}$ or $| 0 \rangle$ using Dirac vector notation. A cbit with the value 1 can be written as $\begin{pmatrix} 0 \\ 1\end{pmatrix}$ or $| 1 \rangle$. Formally, a cbit can represented by $\begin{pmatrix} a \\ b \end{pmatrix}$ where $a$ and $b$ are numbers and $\lVert a \rVert^2 + \lVert b \rVert^2 = 1$.
We can use the same notation to represent quantum bits (qbit). However, the qbits lives in a richer space where it exists as both 0 and 1 over a probabily of each bit collapsing to one or another. We denote that probability coefficients instead of binary 0s and 1s.
For example, a qbit $\begin{pmatrix} 1 \\ 0 \end{pmatrix}$ has a 100% chance of collapsing to 0, and a qbit $\begin{pmatrix} 0 \\ 1 \end{pmatrix}$ has a 100% chance of collapsing to 1. Similarly, a qbit $\begin{pmatrix} 0.25 \\ 0.75 \end{pmatrix}$ has a 25% chance of collapsing into 0 and 75% chance of collapsing into 1. Accordingly, a qbit $\begin{pmatrix} 0.5 \\ 0.5 \end{pmatrix}$ has a 50% chance of collapsing to either 0 or 1. We can summarize this as, if a qbit has value $\begin{pmatrix} a \\ b \end{pmatrix}$ then it collapses to 0 with probability $\lVert a \rVert^2$ and 1 with probability $\lVert b \rVert^2$.
Two cbits vs two qbits
Two cbits can exist in any one of $|00\rangle, |01\rangle, |10\rangle or |11\rangle$ states; it's important to emphasise that only one of the those state can exist at any given time. It's probable states can represented as the tensor products as shown below.
$| 00 \rangle = \begin{pmatrix} 1 \\ 0 \end{pmatrix} \otimes \begin{pmatrix} 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 1 \\ 0 \\ 0 \\ 0 \end{pmatrix}$
$| 01 \rangle = \begin{pmatrix} 1 \\ 0 \end{pmatrix} \otimes \begin{pmatrix} 0 \\ 1 \end{pmatrix} = \begin{pmatrix} 0 \\ 1 \\ 0 \\ 0 \end{pmatrix}$
$| 10 \rangle = \begin{pmatrix} 0 \\ 1 \end{pmatrix} \otimes \begin{pmatrix} 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 1 \\ 0 \end{pmatrix}$
$| 11 \rangle = \begin{pmatrix} 0 \\ 1 \end{pmatrix} \otimes \begin{pmatrix} 0 \\ 1 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 0 \\ 1 \end{pmatrix}$
A single qbit can exist as both $|0\rangle$ and $|1\rangle$. Two qbits exists in as all four states $|00\rangle, |01\rangle, |10\rangle and |11\rangle$ simultaneously 🤯. For example, qbits $\begin{pmatrix} 0.5 \\ 0.5 \end{pmatrix}$ and $\begin{pmatrix} 0.25 \\ 0.75 \end{pmatrix}$ exists in all four possible states $|00\rangle, |01\rangle, |10\rangle, |11\rangle$ simultaneously with probability of $0.125, 0.375, 0.125$ and $0.375$ respectively.
$\begin{pmatrix} 0.5 \\ 0.5 \end{pmatrix} \otimes \begin{pmatrix} 0.25 \\ 0.75 \end{pmatrix} = \begin{pmatrix} 0.125 \\ 0.375 \\ 0.125 \\ 0.375 \end{pmatrix}$
2 cbits contains 2 bits of information. 2 qbits contains 4 bits of information that represents the current state of the system -- ie you need to give 4 pieces of information (the four coefficients $0.125, 0.375, 0.125$ and $0.375$) to describe the current state.
Eleven cbits vs eleven qbits
If you have $n$ qbits you get to work with $2^n$ probabilities. So every single additional qbit doubles the number of probabilities you get work with. Let's see why this is a big deal.
Imagine if you have 11 cbits and 11 qbits. With 11 cbits, we can store a single number at any given time, eg. 2019 as $|11111100011\rangle$. With 11 qbits, we can store all the numbers from 1 to 2048 ($2^{11}$) simultaneously, over a probability distribution. With just 300 qbits we could perform more calculations at once than there are atoms in the observable universe 🤯 🤯.
Operations on cbits vs qbits
When we instruct a quantum computers to do something, we're just changing these probabilities to favor some computation that we're trying to do. We can combine multiple primitive logic gates like AND, OR, XOR and NAND to perform operations on cbits to compute almost anything. Similarly, we can use quantum logic gates like NOT, CNOT, Z and Hadamard gate to perform quantum operations. By carefully manipulating the probabilities so that correct answers are amplified and incorrect ones cancel out, a quantum algorithm can solve exponential problems in polynomial time, eg. in simulating complicated chemical interactions.
We will have to take some classical bits, put them into quantum computer, do a bunch of quantum operations. And at the end of it, we can collapse them to zero or one, so that our whole computation is deterministic. We cannot measure all the states in a quantum computer but we can measure few final states to get the answer that we are after. We usually run the program multiple times to gather statistics to deal with the noise caused by stability issues with current gen quantum computers.
Quantum computers are in still in their early days and there are plenty of Turing Awards waiting to be collected.