The variational quantum eigensolver (VQE)

This lesson will introduce the variational quantum eigensolver, explain its importance as a foundational algorithm in quantum computing, and also explore its strengths and weaknesses. VQE by itself, without augmenting methods, is not likely to be sufficient for modern utility scale quantum computations. It is nevertheless important as an archetypal classical-quantum hybrid method, an it is an important foundation upon which many more advanced algorithms are built.

This video gives an overview of VQE and factors that affect its efficiency. The text below adds more detail and implements VQE using Qiskit.

1. What is VQE?

The variational quantum eigensolver is an algorithm that uses classical and quantum computing in conjunction to accomplish a task. There are 4 main components of a VQE calculation:

An operator: Often a Hamiltonian, which we’ll call $H$ , that describes a property of your system that you wish to optimize. Another way of saying this is that you are seeking the eigenvector of this operator that corresponds to the minimum eigenvalue. We often call that eigenvector the “ground state”.
An “ansatz” (a German word meaning “approach”): this is a quantum circuit that prepares a quantum state approximating the eigenvector you’re seeking. Really the ansatz is a family of quantum circuits, because some of the gates in the ansatz are parametrized, that is, they are fed a parameter which we can vary. This family of quantum circuits can prepare a family of quantum states approximating the ground state.
An estimator: a means of estimating the expectation value of the operator $H$ over the current variational quantum state. Sometimes what we really care about is simply this expectation value, which we call a cost function. Sometimes, we care about a more complicated function that can still be written starting from one or more expectation values.
A classical optimizer: an algorithm that varies parameters to try to minimize the cost function.

Let's look at each of these components in more depth.

1.1 The operator (Hamiltonian)

At the core of a VQE problem is an operator that describes a system of interest. We will assume here that the lowest eigenvalue and the corresponding eigenvector of this operator are useful for some scientific or business purpose. Examples might include a chemical Hamiltonian describing a molecule, such that the lowest eigenvalue of the operator corresponds to the ground state energy of the molecule, and the corresponding eigenstate describes the geometry or electron configuration of the molecule. Or the operator could describe a cost of a certain process to be optimized, and the eigenstates could correspond to routes or practices. In some fields, like physics, a "Hamiltonian" almost always refers to an operator describing the energy of a physical system. But in quantum computing, it is common to see quantum operators that describe a business or logistical problem also referred to as a "Hamiltonian". We will adopt that convention here.

An image of atomic orbitals and an image of a network of many nodes and connections between them.

Mapping a physical or optimization problem to qubits is typically a non-trivial task, but those details are not the focus of this course. A general discussion of mapping a problem to a quantum operator can be found in Quantum computing in practice . A more detailed look at the mapping of chemistry problems into quantum operators can be found in Quantum Chemistry with VQE .

For the purposes of this course, we will assume the form of the Hamiltonian is known. For example, a Hamiltonian for a simple hydrogen molecule (under certain active space assumptions, and using the Jordan-Wigner mapper) is:

from qiskit.quantum_info import SparsePauliOp
 
hamiltonian = SparsePauliOp(
    [
        "IIII",
        "IIIZ",
        "IZII",
        "IIZI",
        "ZIII",
        "IZIZ",
        "IIZZ",
        "ZIIZ",
        "IZZI",
        "ZZII",
        "ZIZI",
        "YYYY",
        "XXYY",
        "YYXX",
        "XXXX",
    ],
    coeffs=[
        -0.09820182 + 0.0j,
        -0.1740751 + 0.0j,
        -0.1740751 + 0.0j,
        0.2242933 + 0.0j,
        0.2242933 + 0.0j,
        0.16891402 + 0.0j,
        0.1210099 + 0.0j,
        0.16631441 + 0.0j,
        0.16631441 + 0.0j,
        0.1210099 + 0.0j,
        0.17504456 + 0.0j,
        0.04530451 + 0.0j,
        0.04530451 + 0.0j,
        0.04530451 + 0.0j,
        0.04530451 + 0.0j,
    ],
)

16\times 16

import numpy as np
 
A = np.array(hamiltonian)
eigenvalues, eigenvectors = np.linalg.eigh(A)
print("The ground state energy is ", min(eigenvalues), "hartrees")

Output:

The ground state energy is  -1.1459778447627311 hartrees

Brute force classical eigensolvers cannot scale to describe the energies or geometries of very large systems of atoms, like medications or proteins. VQE is one of the early attempts to leverage quantum computing in this problem.

We will encounter Hamiltonians in this lesson much larger than that above. But it would be wasteful to push the limits of what VQE can do, before we introduce some of the more advanced tools that can augment or replace VQE, later in this course.

1.2 Ansatz

\vec{\theta}

An image of a quantum circuit with variational parameters labeled "theta".

\vec{\theta_0}

|\psi(\vec{\theta_0})\rangle=U_{\text{var}}(\vec{\theta_0})|0\rangle^{\otimes N}

P_j

\langle \psi(\vec{\theta_0}) |H|\psi (\vec{\theta_0}) \rangle

P_j

P_j=|\langle \phi_j|\psi(\vec{\theta_0})\rangle|^2 = |\langle \phi_j|U_{\text{var}}(\vec{\theta_0})|0\rangle^{\otimes N}|^2

|\psi(\vec{\theta_0})\rangle

|0\rangle^{\otimes N}

\begin{aligned} |\psi_{\text{ref}}\rangle&=U_{\text{ref}}|0\rangle^{\otimes N}\\ |\psi_{\text{ansatz}}(\vec{\theta})\rangle&=U_{var}(\vec{\theta})|\psi_{\text{ref}}\rangle = U_{\text{var}}(\vec{\theta})U_{\text{ref}}|0\rangle^{\otimes N}. \end{aligned}

1.3 Estimator

|\psi(\vec{\theta})\rangle

\langle \psi(\vec{\theta})|H|\psi(\vec{\theta})\rangle _N \approx \frac{1}{N}\sum_{j=1}^N {E_j}

\approx

H

1.4 Classical optimizers

A classical optimizer is any classical algorithm designed to find extrema of a target function (typically a minimum). They search through the space of possible parameters looking for a set that minimizes some function of interest. They can be broadly categorized into gradient-based methods, which utilize gradient information, and gradient-free methods, which operate as black-box optimizers. The choice of classical optimizer can significantly impact an algorithm's performance, especially in the presence of noise in quantum hardware. Popular optimizers in this field include Adam, AMSGrad, and SPSA, which have shown promising results in noisy environments. More traditional optimizers include COBYLA and SLSQP.

A common workflow (demonstrated in Section 3.3) is to use one of these algorithms as the method inside a minimizer like scipy's minimize function. This takes as its arguments:

Some function to be minimized. This is often the energy expectation value. But these are generally referred to as "cost functions".
A set of parameters from which to begin the search. Often called $x_0$ or $\theta_0$ .
Arguments, including arguments of the cost function. In quantum computing with Qiskit, these arguments will include the ansatz, the Hamiltonian, and the estimator, which is discussed more in the next subsection.
A 'method' of minimization. This refers to the specific algorithm used to search the parameter space. This is where we would specify, for example, COBYLA or SLSQP.
Options. The options available may differ by method. But an example which practically all methods would include is the maximum number of iterations of the optimizer before ending the search: 'maxiter'.

An image showing a curved line representing energy with several points at which the value is being tested to find the minimum.

At each iterative step, the expectation value of the Hamiltonian is estimated by making many measurements. This estimated energy is returned by the cost function, and the minimizer updates the information it has about the energy landscape. Exactly what the optimizer does to choose the next step varies from method to method. Some use gradients and select the direction of steepest descent. Others may take noise into account and may require that the cost decrease by a large margin before accepting that the true energy decreases along that direction.

# Example syntax for minimization
# from scipy.optimize import minimize
# res = minimize(cost_func, x0, args=(ansatz, hamiltonian, estimator), method="cobyla", options={'maxiter': 200})

1.5 The variational principle

In this context the variational principle is very important; it states that no variational wave function can yield an energy (or cost) expectation value lower than that yielded by the ground state wave function. Mathematically,

E_\text{var}=\langle \psi_\text{var}|H|\psi_\text{var}\rangle \geq E_\text{min}=\langle \psi_\text{0}|H|\psi_\text{0}\rangle

\{|\psi_0\rangle, |\psi_1\rangle, |\psi_2\rangle, ...|\psi_n \rangle\}

|\psi_\text{var}\rangle=\sum_{i=0}^n c_i |\psi_i\rangle

c_i

Check-in question

E_\text{var}\geq E_0

Answer:

Using the given expansion of the variational state in terms of the energy eigenstates,

|\psi_\text{var}\rangle=\sum_{i=0}^n c_i |\psi_i\rangle,

we can write the variational energy expectation value as

\begin{aligned} E_\text{var}&=\langle \psi_\text{var}|H|\psi_\text{var}\rangle =\left(\sum_{i=0}^n c^*_i \langle \psi_i|\right)H\left(\sum_{j=0}^n c_j |\psi_j\rangle\right)\\ &=\left(\sum_{i=0}^n c^*_i \langle \psi_i|\right)\left(\sum_{j=0}^n c_j E_j|\psi_j\rangle\right)\\ &=\sum_{i,j=0}^n c^*_i c_j E_j \langle \psi_i|\psi_j\rangle\\ &=\sum_{i,j=0}^n c^*_i c_j E_j \delta_{i,j}\\ &=\sum_{i=0}^n |c_i|^2 E_i. \end{aligned}

0\leq|c_i|^2\leq 1

\begin{aligned} E_\text{var}&=\sum_{i=0}^n |c_i|^2 E_i\geq \sum_{i=0}^n |c_i|^2 E_0 = E_0 \sum_{i=0}^n |c_i|^2 = E_0(1) \\ E_\text{var}&\geq E_0 \end{aligned}

2. Comparison with classical workflow

Let’s say we are interested in a matrix with N rows and N columns. Suppose your matrix is so large that exact diagonalization is not an option. Suppose further that you know enough about your problem that you can make some guesses about the overall structure of the target eigenstate, and you want to probe states similar to your initial guess to see if your cost/energy can be lowered further. This is a variational approach, and it is one method that is used when exact diagonalization is not an option.

2.1 Classical workflow

Using a classical computer, this would work as follows:

Make a guess state, with some parameters $\vec{\theta}_i$ that you will vary: $|\psi(\vec{\theta}_i)\rangle$ . Although this initial guess could be random, that is not advisable. We want to use knowledge of the problem at hand to tailor our guess as much as possible.
Calculate the expectation value of the operator with the system in that state: $\langle\psi(\vec{\theta}_i)|H|\psi(\vec{\theta}_i)\rangle$
Alter the variational parameters and repeat: $\vec{\theta}_i\rightarrow \vec{\theta}_{i+1}$ .
Use accumulated information about the landscape of possible states in your variational subspace to make better and better guesses and approach the target state. The variational principle guarantees that our variational state cannot yield an eigenvalue lower than that of the target ground state. So the lower the expectation value the better our approximation of the ground state:

\min_{\vec{\theta}} \{ E_{\text{var},i} = \langle\psi(\vec{\theta_i})|H|\psi(\vec{\theta_i})\rangle \} \geq E_0

|\psi(\vec{\theta_i})\rangle

N\times N

2.2 Quantum workflow and commuting Pauli groups

|\psi(\vec{\theta_i})\rangle

H

H = \sum_{\alpha = 1}^T{c_\alpha h_\alpha}

h_\alpha

\langle \Psi |H|\Psi \rangle =\sum_{\alpha} c_\alpha \langle \Psi | h_\alpha|\Psi \rangle.

And in the context of our variational wave function:

\langle \psi(\vec{\theta}_i) |H|\psi(\vec{\theta}_i) \rangle =\sum_{\alpha} c_\alpha \langle \psi(\vec{\theta}_i) | h_\alpha|\psi(\vec{\theta}_i) \rangle

h_\alpha

\begin{aligned} \langle \psi(\vec{\theta}_i) |h_\alpha|\psi(\vec{\theta}_i) \rangle &\simeq \mu _\alpha \pm \frac{\sigma_\alpha}{\sqrt{M}} &\qquad \mu_\alpha &=\frac{1}{M}\sum_j s_{\alpha,j} &\qquad \sigma^2_\alpha &=\frac{1}{M-1}\sum_j (s_{\alpha,j}-\mu_\alpha)^2\\ \langle \psi(\vec{\theta}_i) |H|\psi(\vec{\theta}_i) \rangle &\simeq \mu \pm \sigma &\qquad \mu &= \sum_\alpha c_\alpha \mu_\alpha &\qquad \sigma^2&=\sum_\alpha c^2_\alpha \frac{\sigma^2_\alpha }{M} \end{aligned}

N^2

But let’s talk about reasons why this might not be efficient. One reason for many measurements is to reduce the statistical uncertainty in your estimates, for very high-precision calculations. Another reason is the number of Pauli strings required to span your entire matrix. Because Pauli matrices (plus the identity: X, Y, Z, and I) span the space of all operators of a given dimension, we are guaranteed that we can write our matrix of interest as a weighted sum of Pauli operators, as we did before.

H = \sum_{\alpha = 1}^T{c_\alpha h_\alpha}

h_\alpha

But we cannot measure all those Pauli operators simultaneously. Pauli operators (excluding I) do not commute with each other if they are associated with the same qubit. For example, we can measure IZIZ and ZZXZ simultaneously, because we can measure I and Z simultaneously for the 3rd qubit, and we can know I and X simultaneously for the 1st qubit. But we cannot measure ZZZZ and ZZZX simultaneously, because Z and X do not commute, and both act on the 0th qubit.

A table of different Pauli strings, some of which commute and others which do not.

H

The processes required for estimation of the expectation value are collected together in the Qiskit Runtime primitive called Estimator. To learn more about Estimator, see the API reference in IBM Quantum Documentation. One can simply use Estimator directly, but Estimator returns much more than just the lowest energy eigenvalue. For example, it also returns information on ensemble standard error. Thus, in the context of minimization problems, one often sees Estimator inside a cost function. To learn more about Estimator inputs and outputs see this guide on IBM Quantum Documentation.

\vec{\theta_i}

Two images of a curved line with a minimum value. In one, points are randomly checked in the search for a minimum, in the other a gradient is estimated by drawing a line between two adjacent points.

2.3 Factors that determine computational cost

VQE will not solve all your toughest quantum chemistry problems. No. But being better at all calculations is not the point. We have shifted what determines the computational cost.

A table comparing classical and quantum variational approaches. Both require good initial guesses. Classically, the cost scales like the dimension of your matrix squared, and in the quantum approach it depends on how many groups of commuting Pauli operators you have.

We’ve shifted from a process whose complexity depends only on matrix dimension to one that depends on required precision and the number of non-commuting Pauli operators that make up the matrix. The last bit has no analog in classical computing.

Based on these dependencies, for sparse matrices, or matrices involving few non-commuting Pauli strings, this process may be useful. This is the case for systems of interacting spins, for example. For dense matrices, it may be less useful. We know for example that chemical systems often have Hamiltonians that involve hundreds, thousands, even millions of Pauli strings. There has been interesting work done to reduce this number of terms. But chemical systems may be better suited to some of the other algorithms we will discuss in this course.

Check-in question

Consider a Hamiltonian on four qubits that contains the terms:

IIXX, IIXZ, IIZZ, IZXZ, IXXZ, ZZXZ, XZXZ, ZIXZ, ZZZZ, XXXX

You wan to sort these terms into groups such that all terms in a group can be measured simultaneously. What is the smallest number of such groups you can make such that all terms are accounted for?

Answer:

It can be done in 5 groups. Note that such solutions are typically not unique.

IIXX, XXXX

IIXZ, IZXZ, ZZXZ

IIZZ, ZZZZ

IXXZ, ZIXZ

XZXZ

Check-in question

Which do you expect typically makes quantum chemistry with VQE difficult: the number of terms in the Hamiltonian, finding a good ansatz?

Answer:

It turns out there are ansätze that are highly optimized for chemical contexts. The number of terms in the Hamiltonian, and hence the number of measurements required typically cause more problems.

3. Example Hamiltonian

Let us put this algorithm into practice using a small Hamiltonian matrix so that we can see what is happening in each step. We will employ the Qiskit patterns framework:

- Step 1 : Map problem to quantum circuits and operators - Step 2 : Optimize for target hardware - Step 3 : Execute on target hardware - Step 4 : Post-process results

3.1 Step 1: Map the problem to quantum circuits and operators

We will use the one defined above from the chemistry context. We start with some general imports.

# General imports
import numpy as np
 
# SciPy minimizer routine
from scipy.optimize import minimize
 
# Plotting functions
import matplotlib.pyplot as plt

Again, we assume the Hamiltonian of interest is known. We will use an extremely small Hamiltonian here, because other methods discussed in this course will be more efficient at solving larger problems.

from qiskit.quantum_info import SparsePauliOp
import numpy as np
 
hamiltonian = SparsePauliOp.from_list(
    [("YZ", 0.3980), ("ZI", -0.3980), ("ZZ", -0.0113), ("XX", 0.1810)]
)
 
A = np.array(hamiltonian)
eigenvalues, eigenvectors = np.linalg.eigh(A)
print("The ground state energy is ", min(eigenvalues))

Output:

The ground state energy is  -0.702930394459531

There are many prefabricated ansatz choices in Qiskit. We will use EfficientSU2 .

# Pre-defined ansatz circuit and operator class for Hamiltonian
from qiskit.circuit.library import EfficientSU2
 
# Note that it is more common to place initial 'h' gates outside the ansatz. Here we specifically wanted this layer structure.
ansatz = EfficientSU2(
    hamiltonian.num_qubits, su2_gates=["h", "rz", "y"], entanglement="circular", reps=1
)
num_params = ansatz.num_parameters
print("This circuit has ", num_params, "parameters")
 
ansatz.decompose().draw("mpl", style="iqp")

Output:

This circuit has  4 parameters

Different ansätze will have different entangling structures and different rotation gates. The one shown here uses CNOT gates for entangling, and parametrized RY and RZ gates. Note the size of this parameter space; it means we must minimize the cost function over 8 variables. This can be scaled up, but not indefinitely. Running a similar problem on 4 qubits, using the default 3 reps for EfficientSU2 yields 32 variational parameters.

3.2 Step 2: Optimize for target hardware

The ansatz was written using familiar gates, but our circuit must be transpiled to make use of the basis gates that can be implemented on each quantum computer. We select the least busy backend.

# runtime imports
from qiskit_ibm_runtime import QiskitRuntimeService, Session
from qiskit_ibm_runtime import EstimatorV2 as Estimator
 
# To run on hardware, select the backend with the fewest number of jobs in the queue
service = QiskitRuntimeService(channel="ibm_quantum")
backend = service.least_busy(operational=True, simulator=False)
 
print(backend)

Output:

<IBMBackend('ibm_cusco')>

We can now transpile our circuit for this hardware and visualize our transpiled ansatz.

from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager
 
target = backend.target
pm = generate_preset_pass_manager(target=target, optimization_level=3)
 
ansatz_isa = pm.run(ansatz)
 
ansatz_isa.draw(output="mpl", idle_wires=False, style="iqp")

Output:

Note that the gates used have changed, and the qubits in our abstract circuit have been mapped to differently-numbered qubits on the quantum computer. We must map our Hamiltonian identically in order for our results to be meaningful.

hamiltonian_isa = hamiltonian.apply_layout(layout=ansatz_isa.layout)

3.3 Step 3: Execute on target hardware

3.3.1 Reporting out values

We define a cost function here that takes as arguments the structures we have built in previous steps: the parameters, the ansatz, and the Hamiltonian. It also uses the estimator which we have not yet defined. We include code to track the history of our cost function, so that we can check convergence behavior.

def cost_func(params, ansatz, hamiltonian, estimator):
    """Return estimate of energy from estimator
 
    Parameters:
        params (ndarray): Array of ansatz parameters
        ansatz (QuantumCircuit): Parameterized ansatz circuit
        hamiltonian (SparsePauliOp): Operator representation of Hamiltonian
        estimator (EstimatorV2): Estimator primitive instance
        cost_history_dict: Dictionary for storing intermediate results
 
    Returns:
        float: Energy estimate
    """
    pub = (ansatz, [hamiltonian], [params])
    result = estimator.run(pubs=[pub]).result()
    energy = result[0].data.evs[0]
 
    cost_history_dict["iters"] += 1
    cost_history_dict["prev_vector"] = params
    cost_history_dict["cost_history"].append(energy)
    print(f"Iters. done: {cost_history_dict['iters']} [Current cost: {energy}]")
 
    return energy
 
 
cost_history_dict = {
    "prev_vector": None,
    "iters": 0,
    "cost_history": [],
}

It is highly advantageous if you can choose initial parameter values based on knowledge of the problem at hand and characteristics of the target state. We will make no assumptions of such knowledge and use random initial values.

x0 = 2 * np.pi * np.random.random(num_params)

# This required 13 min, 20 s QPU time on ibm_cusco, 28 min total time.
with Session(backend=backend) as session:
    estimator = Estimator(mode=session)
    estimator.options.default_shots = 10000
 
    res = minimize(
        cost_func,
        x0,
        args=(ansatz_isa, hamiltonian_isa, estimator),
        method="cobyla",
        options={"maxiter": 50},
    )

Output:

Iters. done: 1 [Current cost: 0.28219123773417876]
Iters. done: 2 [Current cost: 0.04857689634940197]
Iters. done: 3 [Current cost: -0.31794536919423433]
Iters. done: 4 [Current cost: -0.4069430404655289]
Iters. done: 5 [Current cost: -0.31151962989734233]
Iters. done: 6 [Current cost: -0.34160711444979264]
Iters. done: 7 [Current cost: -0.3631239245619347]
Iters. done: 8 [Current cost: -0.38424156231763107]
Iters. done: 9 [Current cost: -0.46563838279474973]
Iters. done: 10 [Current cost: -0.48189435351301163]
Iters. done: 11 [Current cost: -0.48580160382191184]
Iters. done: 12 [Current cost: -0.44577270841639954]
Iters. done: 13 [Current cost: -0.4638234527978863]
Iters. done: 14 [Current cost: -0.5416410736954318]
Iters. done: 15 [Current cost: -0.5855358330991166]
Iters. done: 16 [Current cost: -0.5794988569759748]
Iters. done: 17 [Current cost: -0.5847364869597874]
Iters. done: 18 [Current cost: -0.6350690843468105]
Iters. done: 19 [Current cost: -0.6578492739878075]
Iters. done: 20 [Current cost: -0.6634539233552738]
Iters. done: 21 [Current cost: -0.6296298079631729]
Iters. done: 22 [Current cost: -0.6540641555988675]
Iters. done: 23 [Current cost: -0.675312860772722]
Iters. done: 24 [Current cost: -0.6950511793661087]
Iters. done: 25 [Current cost: -0.6717925648149305]
Iters. done: 26 [Current cost: -0.6715683050759771]
Iters. done: 27 [Current cost: -0.6585891663541322]
Iters. done: 28 [Current cost: -0.6829580107480062]
Iters. done: 29 [Current cost: -0.6735112197545985]
Iters. done: 30 [Current cost: -0.6847631879125596]
Iters. done: 31 [Current cost: -0.6749113605798552]
Iters. done: 32 [Current cost: -0.6745707304830706]
Iters. done: 33 [Current cost: -0.688686554504189]
Iters. done: 34 [Current cost: -0.6824317635732478]
Iters. done: 35 [Current cost: -0.6801435940910384]
Iters. done: 36 [Current cost: -0.6838235742717842]
Iters. done: 37 [Current cost: -0.6703267002736204]
Iters. done: 38 [Current cost: -0.68286941128606]
Iters. done: 39 [Current cost: -0.6782013099576968]
Iters. done: 40 [Current cost: -0.6733243747135585]
Iters. done: 41 [Current cost: -0.6795572076871205]
Iters. done: 42 [Current cost: -0.6782381635864793]
Iters. done: 43 [Current cost: -0.6852153370275786]
Iters. done: 44 [Current cost: -0.6792907636500196]
Iters. done: 45 [Current cost: -0.6699327507696534]
Iters. done: 46 [Current cost: -0.6707285595671085]
Iters. done: 47 [Current cost: -0.6772933314233435]
Iters. done: 48 [Current cost: -0.6715509174358688]
Iters. done: 49 [Current cost: -0.6683810729341132]
Iters. done: 50 [Current cost: -0.6759676856300733]

We can look at the raw outputs.

res

Output:

 message: Maximum number of function evaluations has been exceeded.
 success: False
  status: 2
     fun: -0.6950511793661087
       x: [ 7.597e+00  5.482e+00  6.182e+00  1.797e+00]
    nfev: 50
   maxcv: 0.0

3.4 Step 4: Post-process results

If the procedure terminates correctly, then the values in our dictionary should be equal to the solution vector and total number of function evaluations, respectively. This is easy to verify:

cost_history_dict

Output:

{'prev_vector': array([7.59742585, 5.48170942, 6.18168668, 1.79713719]),
 'iters': 50,
 'cost_history': [0.28219123773417876,
  0.04857689634940197,
  -0.31794536919423433,
  -0.4069430404655289,
  -0.31151962989734233,
  -0.34160711444979264,
  -0.3631239245619347,
  -0.38424156231763107,
  -0.46563838279474973,
  -0.48189435351301163,
  -0.48580160382191184,
  -0.44577270841639954,
  -0.4638234527978863,
  -0.5416410736954318,
  -0.5855358330991166,
  -0.5794988569759748,
  -0.5847364869597874,
  -0.6350690843468105,
  -0.6578492739878075,
  -0.6634539233552738,
  -0.6296298079631729,
  -0.6540641555988675,
  -0.675312860772722,
  -0.6950511793661087,
  -0.6717925648149305,
  -0.6715683050759771,
  -0.6585891663541322,
  -0.6829580107480062,
  -0.6735112197545985,
  -0.6847631879125596,
  -0.6749113605798552,
  -0.6745707304830706,
  -0.688686554504189,
  -0.6824317635732478,
  -0.6801435940910384,
  -0.6838235742717842,
  -0.6703267002736204,
  -0.68286941128606,
  -0.6782013099576968,
  -0.6733243747135585,
  -0.6795572076871205,
  -0.6782381635864793,
  -0.6852153370275786,
  -0.6792907636500196,
  -0.6699327507696534,
  -0.6707285595671085,
  -0.6772933314233435,
  -0.6715509174358688,
  -0.6683810729341132,
  -0.6759676856300733]}

fig, ax = plt.subplots()
x = np.linspace(0, 10, 50)
 
# Define the constant function
constant = -0.7029
y_constant = np.full_like(x, constant)
ax.plot(
    range(cost_history_dict["iters"]), cost_history_dict["cost_history"], label="VQE"
)
ax.set_xlabel("Iterations")
ax.set_ylabel("Cost")
ax.plot(y_constant, label="Target")
plt.legend()
plt.draw()

Output:

IBM Quantum has other upskilling offerings related to VQE. If you are ready to put VQE into practice, see our tutorial: Variational Quantum Eigensolver . If you want more information on creating molecular Hamiltonians, see this lesson in our course on Quantum chemistry with VQE . If you are interested in a deeper understanding of how variational algorithms like VQE work, we recommend the course Variational Algorithm Design .

Check-in question

In this section, we calculated a ground state energy from a Hamiltonian. If we wanted to apply this to say, determining the geometry of a molecule, how would we extend this?

Answer:

\text{H}_2

4. VQE's relationship to other methods

In this section we will review the advantages and disadvantages of the original VQE approach and point out its relationships to other, more recent algorithms.

4.1 The strengths and weaknesses of VQE

Some strengths have already been pointed out. They include:

Suitability to modern hardware: Some quantum algorithms require much lower error rates, approaching large scale fault tolerance. VQE does not; it can be implemented on current quantum computers.
Shallow circuits: VQE often employs relatively shallow quantum circuits. This makes VQE less susceptible to accumulated gate errors and makes it suitable for many error mitigation techniques. Of course, the circuits are not always shallow; this depends on the ansatz used.
Versatility: VQE can (in principle) be applied to any problem that can be cast as an eigenvalue/eigenvector problem. There are many caveats that make VQE impractical or disadvantageous for some problems. Some of these are recapped below.

Some weaknesses of VQE and problems for which it is impractical have also been described above. These include:

Heuristic nature: VQE does not guarantee convergence to the correct ground state energy, as its performance depends on the choice of ansatz and optimization methods[1-2]. If a poor ansatz is chosen that lacks the requisite entanglement for the desired ground state, no classical optimizer can reach that ground state.
Potentially numerous parameters: A very expressive ansatz may have so many parameters that the minimization iterations are very time-consuming.
High measurement overhead: In VQE, an estimator is used to estimate the expectation value of each term in the Hamiltonian. Most Hamiltonians of interest will have terms that cannot be simultaneously estimated. This can make VQE resource-intensive for large systems with complicated Hamiltonians[1].
Effects of noise: When the classical optimizer is searching for a minimum, noisy calculations can confuse it and steer it away from the true minimum or delay its convergence. One possible solution for this is leveraging IBM's state of the art error mitigation and error suppression techniques[2-3].
Barren plateaus: These regions of vanishing gradients[2-3] exist even in the absence of noise, but noise makes them more troublesome since the change in expectation values due to noise could be larger than the change from updating parameters in these barren regions.

4.2 Relationship to other approaches

Adapt-VQE

The ADAPT-VQE (Adaptive Derivative-Assembled Pseudo-Trotter Variational Quantum Eigensolver) algorithm is an enhancement of the original VQE algorithm, designed to improve efficiency, accuracy, and scalability for quantum simulations, particularly in quantum chemistry.

The original VQE algorithm described throughout this lesson uses a predefined, fixed ansatz to approximate the ground state of the system. In our case, we used EfficientSU2, with a single repetition, using Y and RZ rotation gates. Although the parameters in the RZ gates changed, the structure of this ansatz and the gates used did not change.

ADAPT-VQE addresses the limitations of VQE through adaptive ansatz construction. Instead of starting with a fixed ansatz, ADAPT-VQE dynamically builds the ansatz iteratively. At each step, it selects the operator from a predefined pool (such as fermionic excitation operators) that has the largest gradient with respect to the energy. This ensures that only the most impactful operators are added, leading to a compact and efficient ansatz [4-6] . This approach can have several beneficial effects:

Reduced Circuit Depth: By growing the ansatz incrementally and focusing only on necessary operators, ADAPT-VQE minimizes gate operations compared to traditional VQE approaches[5,7].
Improved Accuracy: The adaptive nature allows ADAPT-VQE to recover more correlation energy at each step, making it particularly effective for strongly correlated systems where traditional VQE struggles[8,9].
Scalability and Noise Robustness: The compact ansatz reduces the accumulation of gate errors, reduces computational overhead, and limits the number of variational parameters which must be minimized.

ADAPT-VQE is still not perfect. In some cases it can become trapped or slowed by local minima, and it may suffer from over-parameterization. It can also be fairly resource intensive, since it requires the calculation of gradients and parameter optimization with many gate structures.

Quantum phase estimation (QPE)

\epsilon

Krylov, SQD, QSCI & others in this course

VQE helped establish quantum algorithms that still depend on classical computers, not just for operating the quantum computer, but for substantial parts of the algorithm. Several such algorithms are the focus of the remainder of this course. Here, we give a cursory explanation of a few, simply to compare and contrast them to VQE. They will be explained in much greater detail in subsequent lessons.

Krylov quantum diagonalization (KQD)

Krylov subspace methods are ways of projecting a matrix onto a subspace to reduce its dimension and make it more manageable, while keeping the most important features. One trick in this method is to generate a subspace that keeps these features; it turns out that generating this subspace is closely related to a well-established method on quantum computers called Trotterization .

There are a few variants of quantum Krylov methods, but generally the approach is:

Use the quantum computer to generate a subspace (the Krylov subspace) through Trotterization
Project the matrix of interest onto that Krylov subspace
Diagonalize the new projected Hamiltonian using a classical computer

Sampling-based quantum diagonalization (SQD)

Sampling-based quantum diagonalization (SQD) is related to the Krylov method in that it also attempts to reduce the dimension of a matrix to be diagonalized while preserving key features. SQD does this in the following way:

Begin with an initial guess for your ground state and prepare the system in that ground state.
Use Sampler to sample the bitstrings that make up this state.
Use the collection of computational basis states from sampler as the subspace onto which you project your matrix of interest.
Diagonalize the smaller, projected matrix using a classical computer.

This is related to VQE in that it leverages classical and quantum computing for substantial algorithm components. They both also share the requirement that we prepare a good initial guess or ansatz. But the distribution of work between the classical and quantum computers in SQD is more like that of the Krylov method.

In fact, the Krylov method and SQD have recently been combined into the sampling-based Krylov quantum diagonalization (SKQD) method [12] .

Quantum subspace configuration interaction (QSCI)

Quantum Selected Configuration Interaction (QSCI) [13] is an algorithm that produces an approximated ground state of a Hamiltonian by sampling a trial wave function to identify the significant computational basis states to generate a subspace for a classical diagonalization. Both SQD and QSCI use a quantum computer to construct a reduced subspace. QSCI's additional strength is in its state preparation, especially in the context of chemistry problems. It leverages various strategies such as using time-evolved states [14] and a set of chemistry-inspired ansätze. By focusing on efficient state preparation, QSCI reduces quantum computational costs for chemical Hamiltonians while maintaining high fidelity and leveraging the noise robustness from quantum state sampling techniques [15] . QSCI also provides an adaptive construction technique which provides more ansätze for a better result.

The default workflow of QSCI for chemistry problem is as follows:

Build the molecular Hamiltonian using your software of choice (such as SciPy).
Prepare a QSCI algorithm by selecting a proper initial state and a chemistry-inspired ansatz with a pre-selected set of parameters.
Sample significant basis states and diagonalize the Hamiltonian using a classical computer to obtain the ground state energy.
Often one uses configuration recovery [16] and symmetry post-selection [15] as a post processing technique.
Optionally, the workflow of adaptive QSCI has an additional optimization loop from step2 to step3, by using more ansätze with a random initial states.

Check-in question

What does VQE have in common with all the other methods listed above (except QPE which is not described in great detail)?

Answer:

All involve a trial state or wave function of some sort. All work best when the initial guess for this trial state is excellent.

Another correct answer is that they are all easiest to implement when the Hamiltonian is easy to measure (can be sorted into relatively few groups of commuting Pauli operators).

Check-in question

What does VQE have in common with none of the other methods listed above?

Answer:

Classical optimizers. None of the others use classical optimization algorithms to select variational parameters.

References

[2] https://en.wikipedia.org/wiki/Variational_quantum_eigensolver

[3] https://journals.aps.org/prapplied/abstract/10.1103/PhysRevApplied.19.024047

[4] https://arxiv.org/abs/2111.05176

[6] https://inquanto.quantinuum.com/tutorials/InQ_tut_fe4n2_2.html

[7] https://www.nature.com/articles/s41467-019-10988-2

[8] https://arxiv.org/abs/2210.15438

[9] https://journals.aps.org/prresearch/abstract/10.1103/PhysRevResearch.6.013254

[10] https://arxiv.org/html/2403.09624v1

[11] https://www.nature.com/articles/s42005-023-01312-y

[13] https://arxiv.org/abs/1802.00171

[14] https://arxiv.org/abs/2103.08505

[15] https://arxiv.org/html/2501.09702v1

[16] https://quri-sdk.qunasys.com/docs/examples/quri-algo-vm/qsci/

[17] https://arxiv.org/abs/2412.13839

[18] https://arxiv.org/abs/2302.11320v1

[19] https://arxiv.org/pdf/2405.05068v1

Was this page helpful?

Report a bug or request content on GitHub.