In going from classic ML to quantum ML you need to learn and accept a few things. Trying to comprehend why and making sense of these principles can be a challenge. I certainly encourage you to learn about the history and the experiments behind the principles but beware there are impenetrable mysteries nobody has ever returned from.


You need to accept that nature is probabilistic at its deepest level. Imagine you have a (classic) regression task which predicts the average temperature in your office in function of where people are sitting/standing. This would mean that you collect position vectors and use them for predictions. In a quantum context you need to accept that people can be everywhere and anywhere at the same time. More precisely, the position of people in the office is a statistical distribution in space rather than a well-defined position vector. Let’s simplify things and assume that you define a line in one dimension where someone can be and assume there is only one person. In a classic context this means you have positions $p_1, p_2, …p_N$ corresponding to

$x_1 = 1.\Delta, x_2=2.\Delta,…$
with $\Delta$ the discrete fixed distance between the points. In a quantum context you have to accept that this person is not necessarily at a specific position $x_i$ but is located there with a probability.
Of course, this is not necessarily new to you. If you think of a random walk or, more genrically, a Markov process on a graph you have this situation as well. Hold your horses, there is more.


Instead of saying that there is a vector $p_i$ you have to accept that in a quantum context such vector is denoted by


This is called a ket. It’s half a bracket in the sense that

is a full bracket and that it corresponds to the inner product of two vectors. If vector $p_3 = |\,3>$ and $p_{17} = |\,17>$ then

$\langle\,3\,|\,17\rangle = p_3 . p_{17}$
returning a number. The first part of the bracket is called a bra, no surprise.

Why this new notation? Notation matters. You should have a look at how differential geometry looked like before Einstein came along and invented the so-called Einstein summation convention. Without it things would be unreadable in general relativity.
In the case of quantum mechanics this notation makes it easier to differentiate vectors from dual vectors (more about this below) and is also a reminder that the term ‘vector’ is more generic than the common notion of a tuple.

Infinite dimensions

What is a vector? Every programming language has the notion of a tuple or vector. Mathematically, these things belong to vector spaces and the familiar three dimensional Euclidean space is a well-known example. Vector spaces can however be very different from simple tuples;
– the space of finite polynomials can be seen as a vector space
– the set ${e^{i\,n.x}| \;n= 0,1,2…}$ is a vector space and is infinite dimensional. It plays a major role in Fourier analysis.
– the space of 2 dimensional matrices is a vector space

So, the bra-ket notation is a reminder that vectors are not necessarily tuples. In fact, most of the time you should think in terms of functions rather than numbers. In general:

forget about vectors as finite tuples of real numbers, think of vectors as complex valued functions in infinite dimensions.

This has several elements:

  • complex valued: the transition from classic ML to quantum ML is the transition from real numbers to complex numbers. You should also think of probabilities $p$ as the square of a complex number $z$ with $p = |z|^2$.
  • functions: a vectors is in general a function and the classic tuple of numbers is a special case corresponding to a constant function (the constant being the numbers in the tuple).
  • infinite dimensional the transition from classic ML to quantum ML is the transition from finite to infinte dimensions. Mathematically it’s the introduction of Hilbert spaces as a generalization of vector spaces.


The probability of finding something somewhere is an aspect of stochastic processes (Markov chains in particular) but superposition and interference is not. With respect to our temperature regression, imagine that knowing the position of one person in a room automatically tells you the position of another. This is entanglement and it means that the combined state cannot be factored into separate states. The proverbial total is more complex than the sum of its parts. This is a purely quantum effect and is related to the fact that instead of having position


you have a superposition

$$\left|\psi\right\rangle = \sum_i \alpha_i\left|i\right\rangle$$

with $\alpha_i$ a complex number so that the probability of finding the person at location $i$ is $|\alpha_i|^2$ and hence

$$\sum_i \,|\alpha_i|^2 = 1.$$

Example: quantum marketing

If you think of a web-click as a marketing touchpoint this is a binary event; the user clicked on the link or not. In classic ML you use this as a feature in a classification, propensity computation and so on. In quantum ML you don’t know for sure whether the user clicked or not;

$$\left|customer1\right\rangle = \alpha\left|1\right\rangle + \beta\left|0\right\rangle$$

with $|0\rangle$ meaning the user did not click, $|1\rangle$ the user did click and $|\alpha|^2 + |\beta|^2 = 1$ because the probability has to sum up to one. Note that this undertainty is not linked to noise on the data, uncertainty in the measurement or alike; the quantum customer is intrinsically in a superpostion of yes and no. Now imagine that you have another quantum customer on your website with a similar superposition:

$$\left|customer2\right\rangle = \gamma\left|1\right\rangle + \sigma\left|0\right\rangle$$

and let’s call the set ${|1>, |0>}$ the basis of our state space. The combined state of customer 1 and customer 2 could be
$$\left|1\right\rangle\left| 1\right\rangle$$ or $$\left|0\right\rangle\left| 1\right\rangle$$ and so on. Often you will see the direct product $\otimes$ symbol to emphasize the combination and order. So, the combination of both customers having clicked on the link would be
$$\left|1\right\rangle\otimes\left| 1\right\rangle = \left|1 1\right\rangle $$
where the right-hand is simply a shortcut for the left-hand. Now, all of this seems pretty straighforward but let’s try to figure out in what state the two customers are if their combined state is

$\left|tangle\right\rangle = \frac{1}{\sqrt{2}}(\left|11\right\rangle + \left|00\right\rangle)$.
The square-root constant in front is to ensure that the probability is one as required above. If you now try to factor out this state, that is

$\left|tangle\right\rangle = \left|\psi_1\right\rangle \otimes \left|\psi_2\right\rangle$

you get

$\left|\psi_1\right\rangle \otimes \left|\psi_2\right\rangle = (\alpha\left|1\right\rangle + \beta\left|0\right\rangle)\otimes (\gamma\left|1\right\rangle + \sigma\left|0\right\rangle)$

or equivalently

$$\frac{1}{\sqrt{2}}(\left|11\right\rangle + \left|00\right\rangle) = (\alpha\gamma\left|11\right\rangle) + (\alpha\sigma\left|10\right\rangle) + (\beta\gamma\left|01\right\rangle) + (\beta\sigma\left|00\right\rangle)$$
which is not possible to satisfy (using complex numbers).

What does this mean? It says that the combined state of the two customers is one that cannot be described as a product of separate states. The customers have merged into something you cannot describe as a simple combination. There is correlation, interference, superposition…it’s a quantum effect.
The tangled state is sometimes called an EPR state, referring to the so-called Einstein-Podolski-Rosen paradox.

You should note that even without complex maths and in a few paragraphs you end up into something radically different from the classic situation.


If you never know what the real position is or, in general, the real state of something, how can you use it? Measurement is the process of asking something but there is a twist. When you ask you will interefere and change the state of the system. This is the so-called collapse of the wave function.
Assume again

$$\left|customer1\right\rangle = \alpha\left|1\right\rangle + \beta\left|0\right\rangle$$

and asking whether the customer is in state 1. There is a $|\alpha|^2$ chance that is the case. At the moment you ask, you get the answer and if the customer is indeed in state 1 the new state will be
$\left|customer1\right\rangle = \left|1\right\rangle $.

This interference leads to a whole domain of paradoxes and interpretations. See Roger Penrose’s Shadows of the mind for an awe inspiring analysis.

This interference is also the reason that various quantum algorithms and claims around quantum computing have to be read with care. Nature does amazing things on a quantum level but we get only a narrow window. In theory you could create a quantum algorithm which computes all prime numbers but you would be allowed to ask only about a single one.

Dual vector space

The dual of a vector space is a space of linear functions on the vector space. Finite vector spaces are isomorphic (read: structurally identical) to their dual. While the theory is not very difficult I’ll refrain to reproduce the textbooks here. Rather, let’s take the simple binary yes/no (vector) space. The vector or kets in this space are

${\left|0\right\rangle, \left|1\right\rangle}$

while the dual space consists of

${\left\langle0\right|, \left\langle1\right|}$.

This makes immediately clear the term isomorphic; same size, same structure, same appearance. When you use a dual vector on a vector you get a number, specifically

$$\langle i \,| \,j \rangle = \delta_{ij}$$

and you can think of this as the inner product of two orthogonal vectors but remember that this is just a specific example. So, if you apply a general (dual) $\psi$ on $\left|1\right\rangle$ you get
$$\langle\psi|1\rangle = (\alpha\langle0| + \beta\langle1|\,)|1\rangle$$ or equivalently $\langle\psi|1\rangle =\beta$. One says that $\left|1\right\rangle$ projects onto the state. Again, the situation is similar to using inner products in finete vector spaces in order to obtain the coordinates of an arbitrary vector.

You can go a step further and define an object like this

$$P_1 = |1\rangle\langle1|$$

which acts on $\psi$ like so
$$\langle\psi|P = \beta\langle1|.$$
The operator picks out the first component. The operator works both with vectors and duals, it’s symmetric.

Why making a distinction if they are identical? The identification between a vector space and its dual is in general only valid for finite spaces. You can see an easy argument here why things are not so simple in the infinite dimensional case.

Operators and adjonts

The $P_1$ projection above asks a question and collapses the state. It’s an example of an operator acting on the space. Not every operator is however a measurement. A valid measurement in quantum mechanics is one which returns a real number, not a complex one. Operators returning real numbers are called Hermitian operators and are related to the adjoint of an operator. If $P$ is an operator on the space then the adjoint $P^\dagger$ satisfies

$\langle\psi|\,P^\dagger P\,|\psi\rangle = \langle\psi|\psi\rangle$.

In our binary space you can easily construct such operators. First, note that if you take

$Id = |0\rangle\langle0| + |1\rangle\langle1| $

this is the identity operator, it projects a state onto itself. If you take
$P = |1\rangle\langle0| + |0\rangle\langle1| $
you get
$P.P = Id$
hence, this operator is identical to its adjoint
$P^\dagger = P $
The proof that Hermitian operators give real values is very simple but requires one to explain the adjoint operation on vectors.

In the context of quantum computing, operators are called gates.


Everything we have said refers to time independent systems. How do you introduce dynamics in a quantum system? This is where things become a little complicated or, at least, go beyond this basic introduction. If you dig into the textbooks you will discover that the final postulate of quantum mechanics is related to the Hamiltonian of the system and Schrodinger’s equation. The Hamilatonian is nothing but the Hermitian operator which tells you what the energy is of the system and also plays a crucial role in the dynamics. Things become complicated here because:

  • you need to understand a few things about operator ordering
  • you have to understand the transition from classic dynamics to quantum dynamics
  • how solutions of Schrodinger’s equation lead to the concept of tunneling and quantized energy levels.

Personally I think that path-integrals are easier to approach but it also demands a whole lot of machinery.

What you need to know in the context of quantum computing and quantum ML is the following:

  • the Hamiltonian defines the dynamics and the constraints of your system. See a layman’s explanation of adiabatic quantum computing for example.
  • the Hamiltonian defines the lowest energy state and this state is used in quantum computers to bypass calculating minima. That is, due to the fact that nature goes to low-energy states as temperature drops (i.e. absense of excitations) you can use this to find the minimum value of functions. Imagine being able to use neural networks without having to use things like stochastic gradient descents.
  • time is an external and purely classical concept in quantum mechanics and quantum computing. While entanglement leads to all sorts of issues it does not affect the notion of time. Time is however a topic on its own if you go to quantum field theory and quantum gravity where time becomes a dynamic variable of the system. I don’t think that relativistic quantum machine learning is around the corner though.

At this point you have all you need to step into quantum machine learning. This is is covered in our next article.