Review of relevant machine learning methods

In this section, we will review some key terms and methods from classical machine learning that will help us to better understand the workflows in quantum machine learning. We will first introduce some general terms, before doing a deeper dive on two types of machine learning: kernel methods (especially in the context of a support vector machine) and neural networks. There are certainly connections between these methods, but we will treat them as distinct due to the differences in quantum workflows discussed here and in later lessons. This is only a cursory overview, and we will skip a great deal of nuance. For a more complete overview of machine learning, we recommend resources like [1-3] .

Types of machine learning

Supervision

As a simple definition, machine learning is a collection of algorithms that analyze and draw inferences from patterns and relationships in data. Broadly speaking, machine learning algorithms can be grouped into three main categories depending on the type of data involved and how algorithms learn without being explicitly programmed:

Supervised learning: In supervised learning, the data that is used to train the model is labeled. The goal of these algorithms is to learn the relationship between data and their corresponding labels or outputs and to generalize this to unseen data. Common tasks in this class are classification and regression.
Unsupervised learning: In contrast to supervised learning, unsupervised learning uses unlabeled data to train the machine learning model. The goal of such algorithms is to discover hidden patterns and structure in data. Some algorithms in this class are clustering and dimensionality reduction algorithms. Some generative models such as generative adversarial networks and variational autoencoders can also be considered in this category.
Reinforcement learning: Algorithms in this machine learning category are defined by an agent which interacts with an environment. The agent takes actions and receives feedback from its environment in the form of rewards and punishments. Eventually through this feedback mechanism, the agent learns to take the correct set of actions to perform a specific task.

A diagrammatic representation of supervised and unsupervised learning.

Introducing “quantum” to machine learning

We can now start exploring how “quantum” is introduced into machine learning. In this broader categorization, we consider the type of model/algorithm on the processing device, as well as the type of data provided to it. The picture above summarizes these possible combinations.

A diagram showing quadrants with algorithms on one axis and data on the other, where each can be quantum or classical in nature.

For instance, CC means that we have a classical dataset - such as images, sound, or text that we can store on classical computers - and that we also use a classical computer to run a machine learning algorithm. This is precisely the classical machine learning setting. On the other hand, QQ means that we are using a quantum computer to process quantum data. Here, “quantum data” could mean several things, and could be context-dependent. Quantum data could be thought of as a set of measurement outcomes obtained from a quantum device, or it could refer to states that have been prepared on a quantum computer by another algorithm. In the future, it could even refer to data stored in QRAM (Quantum Random Access Memory), which does not currently exist. When researchers talk about quantum machine learning, they usually refer to the CQ regime, where the dataset at hand is classical and the processing device executing the machine learning algorithm is a quantum computer. In the following parts of the course, we will focus on such algorithms.

Support vector machines

We now recap a class of algorithms called support vector machines from a classical machine learning point of view. Later we will show how to bring quantum computing into this algorithm.

Let’s suppose a task of binary classification on a dataset with two-dimensional feature space as shown in the plot. One thing we can do to perform classification for this dataset is to find a line, or in general a hyperplane that separates the two classes. In practice we can find infinitely many separating hyperplanes, so the question is: How do we define the optimal one? The idea here is that a particularly good decision boundary should maximize the margin, which is defined as the distance to the nearest points in each class. In this setting, the data points with the smallest distance to the decision boundary are called support vectors.

f_1

f_1,

f_1(\vec{x}) = \Theta^T \Phi(\vec{x})+b

f_2(\vec{x}) = \sum_{i=1}^n \alpha_i y_i \Phi^T(\vec{x}_i)\Phi(\vec{x})+b

Kernel methods and how quantum can play a role

The video below motivates how quantum can play a role in linear classifiers. This is described in greater detail in the text.

Moving to higher dimensional spaces

In this and the following subsection, the discussion focuses on mappings to higher dimensions. The point here is to explain the "kernel trick" in the context of mappings between spaces, and thus set the stage for what a quantum kernel is. The point is not that higher dimensions in quantum wave functions solve all of our problems. As mentioned in the introduction, classical Gaussian feature maps are already infinite-dimensional. The dimensionality of data features is important, but high-dimensional quantum states are not sufficient for improvement over classical methods.

x_1

A diagram showing a ring of one data type with a second data type filling in the middle of the ring. A second cell shows the data projected into 3D, as in a bowl shape. Now the data are linearly separable.

\Phi

\vec{x} = \begin{pmatrix}x_1 \\ x_2 \end{pmatrix}

\vec{\Phi}(\vec{x}) = \begin{pmatrix}x_1 \\ x_2 \\ x_1 x_2\end{pmatrix}

Some feature maps may map into very high dimensional spaces. In such cases, the high-dimensionality makes inner products more computationally expensive. We will return to that point below.

Why is the dual form useful?

Recall the primal and dual formulations of our linear boundary model:

f_1(\vec{x}) = \Theta^T \Phi(\vec{x})+b

\vec{x}

\vec{\Phi}^T(\vec{x}_i)\cdot \vec{\Phi}(\vec{x}_j)

The kernel function itself is a function of two input data vectors. Inserting each pair of data vectors in the dataset as arguments of the kernel function results in a symmetric, positive semi-definite matrix, called the kernel matrix:

k = \begin{pmatrix}k(\vec{x}_1,\vec{x}_1) & k(\vec{x}_1,\vec{x}_2) & ... \\ k(\vec{x}_2,\vec{x}_1) & k(\vec{x}_2,\vec{x}_2) & ... \\ \vdots & \vdots & \ddots\end{pmatrix}

\alpha_i

Quantum kernels

\vec{x}

An abstract representation of a kernal as a circuit.

As we will see later in the course, we can use measurements on a quantum circuit like the one shown above to estimate a kernel, and we can then run SVM optimization classically on the kernel matrix to learn the tunable parameters.

Variational quantum classifiers and neural networks

Another near-term quantum machine learning algorithm is called "variational quantum circuits" (VQCs). When these circuits are used in a classification task, you may see the same acronym used to refer to "variational quantum classifiers" (also VQCs). These often leverage structures similar to classical neural networks (NNs); and in those cases you will see them described as quantum neural networks (QNNs). It is important to understand that VQCs are more general and do not need to follow a NN structure, but we begin in analogy with NNs to help clarify the role that quantum can play in existing machine learning workflows. We will then discuss generalizations. We begin by recapping classical neural networks.

The video below gives a brief review of neural networks, and where they overlap with variational quantum circuits. This is explored more in the text.

A neural network is a computational model which is loosely inspired by the structure and the function of neurons in a brain. These neurons, which are nodes that we see in the picture, are organized into layers, and are connected through weights.

a_n^0

w_i

Functions like

f(\vec{x}) = \sigma (\vec{w}\cdot \vec{x}+\vec{b})

\vec{x}

You might start your neural network with a random set of weights and biases, or from a known reasonable starting configuration. From there, the idea is to check how well your neural network classifies things and improve it. We use a cost function to describe how our neural network deviates from correct classification. There are many ways of defining a cost function. We will describe one common example, here, which involves the mean-squared error (MSE):

C(w^L_{m,n},b^L_n) = \frac{1}{N}\sum_{i=1}^{N_\text{train}}\sum_{j=1}^{N_\text{outputs}}{(v_{i,j}-p_{i,j})^2}

v_{i,j}

We then vary the parameters like the weights in each layer, between all the neurons, and the biases on all neurons. Classical optimization routines like gradient descent are used to search for a local minimum in the cost function.

Quantum perceptron

To be able to build the quantum counterpart of the perceptron, the one of the things we need to consider is to be able to implement non-linearity with quantum circuits, which is the role of the activation function in classical neural networks. This is because without additional considerations, quantum circuits only implement unitary operations, which are simply linear. There are different methods that we can use to introduce non-linearity to quantum circuits. One of the main methods is to use measurements as a source of non-linearity. Other considerations include quantum Fourier transform based methods, mid-circuit measurements or dynamic circuits, and tracing qubits out of the circuit.

Quantum neural network

U

The data loading and weightings are linear operations.
The measurements are non-linear.
So as in the classical NN, we have both linear and non-linear components.
The weight circuits still have variational parameters, so there is still a classical minimization to be carried out.

A representation of a quantum neural network as a circuit.

f_{QNN}(x) = \langle 0|U^{\dagger}(X)W^{\dagger}OWU(x)|0\rangle

Generalizations

We can now look at one of the ways of constructing the quantum counterpart of a neural network. In this model, the information flow is different from a classical feed-forward neural network. In the classical setting, information would flow from left to right, starting with the input and ending with the model output, and in the reverse direction when doing backpropagation to train the model.

A diagram showing several layers of gates within a quantum neural network

However, in this quantum neural network construction, we see that the unitary block that encodes the data repeats itself between the variational unitary blocks with the trainable parameters. This strategy, which we refer to as “data reuploading”, is backed by interesting theoretical results. In fact, a paper by Pérez-Salinas et al. shows that, with the help of multiple data-reuploading, “a single qubit provides sufficient computational capabilities to construct a universal quantum classifier when assisted with a classical subroutine.” Therefore, data reuploading is a technique that we can use to enhance the expressiveness and representational power of the model, allowing the quantum neural network to approximate complex functions.

References

[1] "Reinforcement Learning: An Introduction", Richard S. Sutton and Richard G. Barto, MIT Press, Second Edition, Cambridge, MA, 2018

[2] "Pattern Recognition and Machine Learning", Christopher M. Bishop, Springer, 2006

[3] "Foundations of Machine Learning", Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar, MIT Press, Second Edition, 2018.

Was this page helpful?

Report a bug or request content on GitHub.