Compute resources and resource management

Resource model and classical resources

In this section, we will provide a framework for thinking about compute environments that can be applied to both a laptop and scaled up to supercomputers. By the end of this section, you will understand the fundamental components of a compute environment and how they relate to each other. This all outlined by Iskandar Sitdikov in the following video.

Resource model

Any classical compute environment is built from several interrelated resources that work together to run applications efficiently. The key resources typically include:

CPU (Central Processing Unit): The CPU is the core processing unit that interprets and executes program instructions. It handles logic, arithmetic, and control operations, essentially acting as the "brain" of the system.
CPU cache (L1, L2, L3): This is the fastest in the system, built directly into or very close to the CPU core. It stores tiny portions of data and instructions that the CPU need immediately. The different level (L1, L2, L3) represent a trade-off: L1 is the smallest and fastest while L3 is the largest and slowest, but still orders of magnitude faster than RAM.
RAM (Random Access Memory): Volatile memory that provides large, temporary storage for program instructions and actively used data. It ensures the CPU can quickly access the information it needs during execution without constant reliance on slower storage devices.
Storage (local and network-based): Storage retains data and software even when the system is powered off, providing long-term persistence for large datasets and applications. In high-performance computing, storage solutions must handle vast amounts of scientific or analytical data with both speed and reliability. Local storage includes solid-state drives (SSDs) and hard disk drives (HDDs), with SSDs favored for their lower latency and higher throughput. For large-scale data handling, parallel file systems, shared network storage, and object-based systems enable rapid access across many compute nodes, while cloud and archival storage tiers support long-term retention and scalability.
GPU (Graphics Processing Unit): While initially designed for rendering graphics, modern GPUs are powerful parallel processors. They are widely used for handling tasks that require many simultaneous calculations like deep learning, physics simulations, and big data analytics. It is important to note that GPUs are not replacing CPUs; CPUs direct higher program logic and GPUs expedite highly parallel steps.
Connections/Busses: These are communication pathways that link the CPU, memory, storage, and peripherals. Busses enable data transfer and coordination between system parts, ensuring smooth communication across the compute environment. In HPC systems, components like CPUs, GPUs, and storage devices are linked by high-speed interconnects that enable rapid data exchange. GPUs commonly connect to the system via PCIe, a standard interface with multiple data lanes for efficient communication. For higher performance, NVLink provides a direct, high-bandwidth link between GPUs or between GPUs and CPUs, reducing latency and accelerating parallel workloads.
Filesystem: The filesystem organizes data on storage devices. It provides structure for storing, retrieving, and managing files, enabling programs and users to access information in a consistent and logical manner.

Each type of resource has its own performance-related units of measurement. For example, CPUs are typically measured by "cores" and "clock speed." When purchasing a laptop, its specifications usually include the number of cores. A similar concept applies to computational nodes in a data center, where each node is associated with a specific number of cores. Computing environments that include multiple resource types (CPUs, GPUs, even QPUs) are referred to as heterogeneous computing environments. These setups handle diverse workloads more efficiently by leveraging the strengths of each processor type. For instance, CPUs would be used for general tasks and GPUs for parallel processing. In the context of resource management and scheduling - especially for heterogeneous computing environments - additional units of measurement may be required along with those described here.

For memory, the unit of measure would be Mega/Giga/Terabytes.

For graphics cards and other accelerators, the unit of measure depends on the context. While their true computing capability is measured by fine-grained metrics - number of processing cores, memory size, and memory bandwidth, in high level discussions of cluster resources or job scheduling, GPUs and similar accelerators can be quantified at the device-level by the number of whole devices assigned (for example, three GPUs).

Network/connectivity/busses are crucial aspect of any compute infrastructure as they dictate how fast data is transferred between compute components. From LPU to cache of CPU, to RAM, to PCI cards, to network connected devices; all of it is communication and it is crucial to have an accurate mental model of it to design highly optimized algorithms for HPC.

An image showing that each computing node may include many types of resources.

Scaling classical resources

High-Performance Computing (HPC) involves scaling these classical resources to achieve faster processing times or increase the data that can be simultaneously handled (for example, to increase the size of solutions spaces that can be searched). This can be achieved through:

Vertical scaling: Increasing the power of individual resources, such as using a more powerful CPU or adding more memory within one physical node, where a node is a unit of a compute cluster encapsulating multiple computing resources within it.
Horizontal scaling: Adding more resources, such as multiple CPUs or GPUs, to work together on a single node or, more commonly, on multiple nodes, enabling distributed computation.

An image showing vertical scaling of resources through placing more resources, like memory, within a single node, and horizontal scaling through increasing the number of connected nodes including different resource types.

Some of the scaling concepts from this section will be applicable to the next section on quantum computing resources. Some other aspects of quantum resources will be quantified in new ways.

Check your understanding

Use the descriptions above to infer some advantages and disadvantages of the different scaling approaches: vertical and horizontal?

Answer:

There may be many correct answers. Vertical scaling is often simpler, especially if you have predictable workloads that will need a fixed amount of resources. But vertical scaling could be more expensive to upgrade, since the fundamental unit of computing cannot be broken down as easily as in horizontal scaling. Horizontal scaling is more complex to manage and sometimes there are difficulties or latencies related to connections between nodes. But it is much more adaptive to varying resource requirements and is modular when upgrades are required.

New type of resource: QPU (Quantum Processing Unit)

In this section, we will introduce a new type of resource - a quantum resource - and explore its definition, units of measure, and connectivity to classical infrastructure.

Definition of QPU

Quantum processing unit (QPU): A QPU includes all of the hardware responsible for accepting an executable quantum instruction set, or a quantum circuit, and returning an accurate answer.

That means the QPU includes one or more quantum chips (e.g. Heron), the multiple additional components in the dilution refrigerator such as the quantum amplifiers, the control electronics, and the classical compute required for tasks such as holding the instructions and waveforms in memory, accumulating results, and future error correction decoding. While a dilution refrigerator is required to perform these tasks, we exclude the dilution refrigerator from this definition to allow for the case of multiple QPUs in the same fridge.

Quantum computer: A quantum computer is comprised of the QPU plus the classical compute which hosts the runtime environment.
Runtime environment: The combination of hardware and software that makes it possible to run a program.

Layers in quantum circuits

In both classical and quantum computing, processes may be executed sequentially or in parallel. Because qubits have a rich state space compared to classical bits, it sometimes makes sense for multiple single-qubit gates to be executed on a qubit in sequence (like an R_x gate followed by an R_z gate). Since entanglement between qubits is critical to quantum computing, it is also common for a quantum circuit to have a set of entangling gates acting across many qubits. These and other factors make it common to identify processes that can be executed in parallel on the scale of individual gate operations in a quantum circuit. In classical computing, bit-level parallelism is also possible but less commonly considered at the gate level; it is more common to refer to parallel and sequential processes at a larger scale.

In quantum computing, one refers to a "layer" of gates which can all be executed simultaneously. In many applications it is useful to perform a set of rotations on all qubits and then entangling gates between pairs of qubits. In these contexts, one refers to a "rotation layer" (a layer of gates like R_x, R-y, and/or R_z) and an "entangling layer" (like one with CNOT gates). The number of layers in a circuit is the "circuit depth", an important measure since greater depth means more layers of compounding noise and errors.

It can be difficult to identify gate layers visually when the layers are not aligned using barriers. In Qiskit, a barrier serves as an instruction in quantum circuits that acts as a visual separator and a constraint during compilation. Both in drawing the circuit and executing it, no gates will be moved across the barrier. This can be important in contexts like dynamical decoupling, in which one intentionally implements gates that simplify to an identity to suppress certain error types. For more on dynamical decoupling see this guide. For the visual effect of barriers, compare these two images of the same circuit, the first without barriers and the second with barriers to force alignment of layers.

Four-qubit quantum circuit with no barriers to force alignment of layers; gates appear somewhat randomly aligned.

Four-qubit quantum circuit with barriers in place to force the alignment of layers. Counting layers is much easier now.

These are the same circuit and have the same number of layers. But in the second, the alignment makes it easy to see that the circuit has:

Two rotation layers: one around the Y axis by $\pi/5$ , one around the Z axis by $\pi/4$ .
Three entangling layers. Note that one can call each CNOT a "layer" on its own, since the CNOTs cannot be re-ordered to be parallel without changing the logical operation.
Two more rotation layers: one around the Y axis by $\pi/3$ , one around the Z axis by $\pi/2$ .
Two more entangling layers. Note that this time the first layer has been slightly more parallelized that in the first set of entangling layers.

The depth of each circuit is 9.

Units of measure

In quantum computing, the capabilities of a quantum system are typically assessed using three key performance metrics: scale, quality, and speed. These metrics not only describe the computational potential of a quantum device, but also inform how resources are managed and scheduled in practical applications.

Scale refers to the number of quantum bits (qubits) in the system, representing how much quantum information the device can hold. In resource management, this directly impacts the circuit width—the number of qubits required to run a given quantum task. A quantum unit must have sufficient qubits to support the assigned task.
Quality describes how accurately quantum operations are performed. It is often quantified by layer fidelity, which measures the accuracy of executing a full layer of quantum gates across all qubits. From a scheduling perspective, higher fidelity allows for deeper circuits to be executed reliably, affecting the need for error mitigation or task decomposition.
Speed is measured by CLOPS (Circuit Layer Operations Per Second), indicating how many layers of quantum operations the system can execute per second. This affects throughput and latency in task execution, and helps determine how quickly a quantum unit can complete a given workload. This speed is especially important on a quantum computer, since qubits suffer from noise and errors to a greater extent than their classical counterparts. The time over which they can retain their quantum information in a useful way is described by the coherence time, typically on the order of 200-300 $\mu\text{s}$ for Heron r3 processors.

Differences between quantum and classical metrics

You may think of CLOPS as a loose quantum analog to FLOPS, but with some key differences. CLOPS measures the speed at which a quantum processor can execute quantum circuits, specifically layers of operations within the circuits, including both quantum and necessary classical computations involved in running circuits. It was developed by IBM Quantum as a holistic measure of a quantum computer's execution speed, covering quantum execution time and real-time classical processing needed for circuit updates, unlike FLOPS which purely measures floating-point arithmetic capacity in classical processors.

CLOPS provides a measurable performance metric that can be benchmarked on existing hardware. IBM Quantum has used CLOPS to benchmark different quantum processors and the values can be found on the Compute resources page on IBM Quantum Platform. CLOPS values depend on hardware capabilities, gate speeds, classical processing speed, and integration of the them.

The qubit count is a fixed number for a given QPU. The CLOPS and quality depend on regular calibration and maintenance and can vary slightly over time, even for a single QPU.

Together, these metrics guide how quantum systems are allocated and scheduled. In many cases, the entire quantum system is treated as a single unit. However, when a task exceeds the capacity of one unit—whether in terms of qubit count, circuit depth, or execution speed—techniques such as circuit cutting/knitting can be used. Circuit cutting is the process of breaks down large quantum tasks into smaller, manageable sub-tasks that can be distributed across multiple quantum chips, enabling scalable quantum computation despite hardware limitations. Circuit knitting refers to the process that comes after circuit cutting — the classical post-processing step that "knits" or combines the results from the smaller subcircuits back together.

Quantum computers do not have traditional memory, in the sense of persistent, addressable storage like RAM or GPU memory.Classical computing resources have discrete bits stored in memory, allowing data to be saved, fetched, and reused during computation. Quantum resources use qubits which do not store memory in the classical sense. Instead, qubits exist in quantum states that represent superpositions of 0 and 1 simultaneously, enabling exponential parallelism in state space. However, qubit states are fragile and cannot be cloned or read deterministically at intermediate steps without collapsing the quantum state, so persistent memory-like behavior during computation does not exist. Qubits must be maintained in a coherent state throughout execution, and the "memory" is essentially the quantum state itself. Classical memory can only be used alongside a quantum processor, not as internal quantum memory. This has significant implications: classical compute resources can reuse and store intermediate results freely; quantum resources cannot do this without measurements which disturb the computation.

Connectivity to classical infrastructure

QPUs can be connected to classical infrastructure through networks and various application programming interfaces (APIs) that allow software developers to interact with QPUs programmatically. These APIs are usually hidden behind software development kits (SDKs) and libraries (like Qiskit) and exposed to computational scientists in the form of programming abstractions (like Qiskit Primitives, which we will be talking about in Chapter 3: programming models).

It is worth drawing a distinction between tight and loose integration of quantum and classical resources. Currently QPUs are not on the same node as classical compute resources. In fact, QPUs are not currently connected via PCIe, but via the network. This could change in the future, but there are engineering challenges related to environmental conditions optimal for QPUs and classical compute resources.

Scaling quantum resources

Scaling of quantum resources can also be categorized into vertical and horizontal.

Vertical scaling would be increasing the number of qubits per chip or improving the fidelity of devices.
Horizontal scaling would be connecting chips with couplers or with classical interconnect.

An image showing vertical scaling of quantum resources as more qubits on a chip, and horizontal scaling of quantum resources as connecting many chips together with couplers.

Check your understanding

What are the quantum analogs of classical (a) bits of information, and (b) processor speed?

Answer:

(a) Quantum bits or qubits - units of information which unlike their classical counterparts (which can only adopt the state 0 or 1), can be in a superposition of 0 and 1 simultaneously.

(b) Circuit layer operations per second or CLOPS - number of sequential operations the QPU can perform each second, including some interfacing with classical computing resources, as in loading parameters from the circuit.

Resource management

Both HPC and quantum resources are both precious and complex; they must be managed carefully. In this section, we will explain how to manage resources for user programs. Resource management in compute infrastructure refers to the process of (1) planning, (2) allocating, and (3) controlling/managing the use of computing resources such as CPUs, memory, storage, and network bandwidth to ensure efficient and effective resource utilization.

Planning - resource estimation

Any program consumes resources, and estimating the resources required is crucial for efficient resource management. This includes estimating the amount of CPU, memory, and other resources needed to execute a program. The same can be said for quantum resources. However, quantum resources exist on a completely different scale. IBM Quantum® Heron r3 quantum processors have 156 qubits, compared to the many billions of classical bits on a common laptop. Time and cost are also considerations. Currently, IBM Quantum has a free plan, the Open Plan, that allows users to explore quantum computing using 10 minutes of QPU time per month. Some research organizations require so much QPU time that they have a dedicated IBM quantum computer on premises.

One step in resource estimation that is unique to quantum computing is the circuit depth. As mentioned earlier, each quantum gate and each delay time between operations comes with noise and a certain probability of an error. The deeper the quantum circuit, the greater the noise. There are two subtleties to this: two-qubit gates have much higher error rates than single-qubit gates, so one can often ignore the single-qubit depth. Further, not all qubits on a quantum chip are directly connected. Sometimes information has to be swapped from qubit to qubit in order to perform the required entanglements and this swapping process itself requires two-qubit gates. That swapping is handled in a process called "transpilation", a complex process that serves other purposes as well; this is discussed in more detail in the next lesson. The relevant limiting quantity is thus the transpiled two-qubit depth. The exact maximum depth at which high-fidelity results can be obtained depends on the circuit. But leveraging modern error mitigation techniques, one can obtain high-fidelity results with transpiled two-qubit depths of 80 or more.

Allocating - scheduling

Scheduling is the process of allocating resources to programs and managing their execution. This involves:

Job submission: The process by which a user sends a request (job) to the HPC system, specifying what computational work and resources are needed for execution.
Resource allocation: The assignment of available HPC system resources (such as nodes, CPUs, memory) to a submitted job based on its requirements.
Job execution: The actual running of the computational tasks defined by the job on the allocated HPC resources.

There are analogs of all these processes for quantum computers.

Jobs are submitted by the user, leveraging Qiskit Runtime, and typically using a Qiskit Runtime primitive, like Sampler, Estimator, or others.
The user selects from a list of backends to which they have access. The complete list of available backends can be seen on the Compute resources page on IBM Quantum Platform. It is common to simply use the least busy quantum computer. But there are cases where it might be important to use a specific one because of device layout considerations, replication of previous calculations, and so on.
The execution of quantum jobs is similar to the HPC case. Although some differences have already been outlined, a few are worth repeating here. QPUs are currently not generally located on the same node as classical compute resources but are connected through a network. This may have scheduling implications. Further, quantum computers may have substantial queue times, and these queue times vary, making precising control of timing difficult. This situation may be different for dedicated systems; that depends on the quantum computer's internal administration.

Controlling/Managing - workload management

Workload management, also known as orchestration, is the process of managing multiple programs and their resource requirements. This involves:

Resource provisioning: The process of preparing and making HPC resources available and ready for use by jobs, including hardware and software setup. As we will see later, QPUs are computing resources that can be provisioned similarly to classical HPC resources, with the caveats from the previous section.
Job scheduling: The activity of the scheduler software in deciding which jobs run, when, and on which resources, managing priorities and queues to efficiently utilize the HPC system. Although this broad statement applies to quantum resources, there may be less control over timing than with other resources.

An image showing workloads (shown as boxes) being organized and arranged to fit optimally into a two dimensional grid with one axis representing time and the other representing resources. Example:

Consider a well-known task as a context for understanding resource management: finding the prime factors of large numbers. Let us further assume that the algorithm being used relies on brute force checking of every potential divisor. While this is often not the most efficient method, it is easy to understand how the workload might be managed.

Planning - resource estimation

Estimate how much CPU time and memory the prime factorization might require.
Plan the parallelization of your task - how many CPUs/cores will you use?

Allocating - scheduling

Upon job submission, the scheduler assigns CPU cores and memory to the prime factorization task. For example, it might allocate all potential divisors ending in the digits 1, 3, 7, 9 to one of four cores, respectively.
Job execution: The prime factorization algorithm runs, performing divisions or other factorization steps on the allocated resources until the task completes.

Controlling/Managing - workload management

The system orchestrates the order and timing of prime factorization jobs to optimize throughput.
The easiest case to imagine is that one of the cores finds the target prime factor. This should stop the calculation on other cores so they can be used for the next task.

Example with quantum resources:

A workflow that will be the subject of other lessons in this course is determining chemical ground states and energies using sample-based quantum diagonalization (SQD). This is covered in more detail in Lesson 4, and you can also visit this course on SQD and related methods on IBM Quantum Learning. All we need to know for this discussion is that the workflow involves the following:

Prepare a quantum circuit
Measure the quantum circuit
Use the measurement results to project the problem into a useful subspace
Diagonalize a smaller, projected matrix using classical computing resources
Iteration, either to ensure self-consistency through considerations like charge conservation, and possible iterations of the quantum circuit if it has variational parameters.

Planning - resource estimation

Map the electronic orbitals to qubits to establish the number of qubits you need.
Combine the mapped Hamiltonian of the system and the (possibly variational) state into a quantum circuit and check the transpiled, two-qubit depth. Ensure it is reasonable.
Estimate the size of the subspace into which you will project; from this, estimate much CPU time and memory the diagonalization might require.
Plan the parallelization of your task - how many CPUs/cores will you use?

Allocating - scheduling

The user selects the QPU; the process of transpilation automatically maps qubits in your abstract quantum circuit to physical qubits on the QPU. This is important since the abstract circuit may assume direct connectivity that does not exist on the chip, among other reasons.
Upon job submission via Qiskit Runtime, the job enters the queue for the selected QPU. The user has no control over the queue time, though this may be different for dedicated systems.
Classical computing resources await the quantum results.
A diagonalization job is submitted to HPC resources; upon job submission, the scheduler assigns CPU cores and memory to the diagonalization task.
Job execution: The diagonalization algorithm runs, diagonalizing the smaller projected matrix until the task completes.

Controlling/Managing - workload management

The system orchestrates the order and timing of quantum and classical steps throughout. For example, once the projected matrix has been diagonalized and a ground state energy obtained, depending on convergence criteria, the workflow may loop back to a new quantum circuit (with a new variational parameter).
When convergence criteria are met by the ground state energy, the calculation on all cores stops.

High-performance computing environments use special software to carry out these steps and manage resources. In the next section, we will learn about a widely-adopted resource management software systems: Slurm. It is important to note that Slurm does not have tools for all steps described above. Slurm does not provide support for planning jobs, nor detailed workload management such as communication between workload components. This is well-suited to the current state of quantum computing in HPC, since QPUs are typically accessed over the network.

Check your understanding

Suppose you are trying to search an unsorted database to find an element which we will call the 'target'. For each of the following actions, state to which stage of resource management it corresponds: (a) Estimating the size of the database and time required to check each element (b) Ensuring that finding the target on one GPU stops the process on other GPUs to free them up for the next problem. (c) Splitting the search space into regions for each of your (say 10) GPUs to search

Answer:

(a) Planning (b) Controlling/managing (c) Allocating/scheduling,

Software: Slurm

In this section, we will apply the concepts learned in this chapter to practice using the popular resource management system, Slurm.

Introduction to Slurm

Slurm is an open-source resource management system widely used in high-performance computing environments. It provides a comprehensive set of tools for managing resources, scheduling jobs, and monitoring system performance.

We will cover the basics of using Slurm, including:

Job submission
Resource allocation
Job monitoring

Since it is really hard to provide HPC resources to each student of this course, we will cheat a bit and provide you a repository with Docker images that are mimicking an actual HPC cluster with Slurm, but on a small scale. It will help us to practice learned concepts in safe, reproducible environments.

Note that currently all quantum and classical resources are allocated for the duration of the whole experiment. Currently there is no interleaved allocation of mixed resources. A final caveat is that even once the job launches, the quantum system will not be directly controlled as a frequent HPC user might expect. The job is launched on an arbitrary x86-node, which executes the Qiskit Runtime service and that runtime service connects to another scheduler over which the user has no direct control. This workflow and related issues may be known to HPC users who had early experience pursuing exclusive access to GPU nodes (the original use of gres).

Installation instructions and overview of setup

In order to practice combining quantum and HPC resources, you will either need access to a real HPC environment or you will need to simulate an HPC environment on your local machine. An installation guide for local setup using Docker can be found in this repository. The guide links out to support for setting up an IBM Cloud® account and installing the SPANK plugin for QRMI. Also in that repository are several Python files to test your environment.

Once you have completed the installation, let's use the command below to check the compute resource of Slurm at your terminal. If the installation was successful, you should be able to confirm a total of three virtual nodes.

$ sinfo
 
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
normal      up 5-00:00:00     2  idle c[1-2]
quantum*    up   infinite     1  idle q1

$ scontrol show node
 
NodeNAME=q1 Arch=x86_64 CoresPerSocket=1
CPUAlloc=0 CPUTot=1 CPULoad=0.34
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=qpu:1
NodeAddr=q1 NodeHostName=q1 Version=21.08.6
...

We have two partitions or node groups: normal and quantum. The normal partition is composed of nodes that have access to classical resources only. The quantum partition has access to quantum resources. You can see the details of each node by executing scontrol show nodes.

Run a simple hello world example in Slurm

Let's first run a simple classical hello world example with Slurm. We will use Python for examples. Let's create hello_world.py, which is self-explanatory.

$ vim hello_world.py
 
import time
time.sleep(10)
print("Hello, World!")
~

Now we need to tell the resource manager what resources we need to execute this program. Slurm provides a way to specify all metadata for the job via a submission script, which is just a shell script with Slurm-specific annotations. These annotations allow you to specify resource requirements, schedule parameters, and more. Let's create a hello_world.sh shell script for this.

$ vim hello_world.sh
 
 
#SBATCH --job-name=hello-world
#SBATCH --output=hello-world.out
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --partition=normal
 
srun hello_world.py
~

Let’s go over the submission file and see what is happening here.

#SBATCH directives are specific annotations to tell what requirements we have for program execution. Here you can specify the amount of resources - the number of nodes, number of tasks per node, the number of tasks and CPUs per node and task - and other options like the name of the output file. A full list of options is available in the documentation for Slurm.

Now it's time to run our Slurm job. sbatch is a command that accepts a submission file and enqueues the job for execution in Slurm.

$ sbatch hello_world.sh
 
Submitted batch job 63

Let's check the status of our program using squeue command.

$ squeue
 
#             JOBID PARTITION     NAME         USER ST       TIME  NODES NODELIST(REASON)
#                 1 main          hello_world  root R        0:01      1 c1

Once the job has finished, we can check the result by looking at the output file.

$ cat hello_world_logs.txt
Hello, World!

Check your understanding

Given the Slurm shell script below, what is (a) the name of the job, (b) the name of the Python file, and (c) the name of the output file? (d) Finally, could this use quantum resources or not?

vim hello_learner.sh


#SBATCH --job-name=hello-learner
#SBATCH --output=hello-learner.out
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --partition=quantum

srun hello_learner_qm.py

Answer:

(a) hello-learner (b) hello-learner_qm.py (c) hello-learner.out (d) Yes it could. It is using the quantum partition.

Run a simple Qiskit hello world example in Slurm

Next, let's try to use quantum resources as well. Let's make and run a simple "Hello, Qiskit" program that uses quantum resources.

$ vim hello_qiskit.py

 
# hello_qiskit.py
from qiskit import QuantumCircuit
from qiskit.quantum_info import SparsePauliOp
from qiskit.transpiler import generate_preset_pass_manager
from qiskit_ibm_runtime import EstimatorV2 as Estimator
 
# Create a new circuit with two qubits
qc = QuantumCircuit(2)
 
# Add a Hadamard gate to qubit 0
qc.h(0)
 
# Perform a controlled-X gate on qubit 1, controlled by qubit 0
qc.cx(0, 1)
 
observables_labels = ["IZ", "IX", "ZI", "XI", "ZZ", "XX"]
observables = [SparsePauliOp(label) for label in observables_labels]
 
# switch to QRMI service
from qiskit_ibm_runtime import QiskitRuntimeService
 
service = QiskitRuntimeService()
 
backend = service.backend("...")
 
# Convert to an ISA circuit and layout-mapped observables.
pm = generate_preset_pass_manager(backend=backend, optimization_level=1)
isa_circuit = pm.run(qc)
 
# Construct the Estimator instance.
 
estimator = Estimator(mode=backend)
estimator.options.resilience_level = 1
estimator.options.default_shots = 5000
 
mapped_observables = [
    observable.apply_layout(isa_circuit.layout) for observable in observables
]
 
# One pub, with one circuit to run against five different observables.
job = estimator.run([(isa_circuit, mapped_observables)])
 
job_result = job.result()
 
pub_result = job.result()[0]
 
print("Result", pub_result)

Here we will use the quantum resource management interface (QRMI), a Slurm SPANK plugin for quantum resources and jobs support which was developed jointly by IBM, Pasqal, the The Hartree Center, and RPI. We made a simple pauli-2-design circuit with random initial values and a simple observable, and we will run it using Estimator to get the expectation value. To run it we will again need the submission script hello_qiskit.sh, which will have quantum resources as a requirement.

$ vim hello_qiskit.sh
 
#SBATCH --job-name=hello-qiskit
#SBATCH --output=hello_qiskit.out
#SBATCH --nodes=1
#SBATCH --ntasks-per-nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --partition=quantum
#SBATCH --gres=qpu:1
 
srun python /data/ch2/hello_qiskit/hello_qiskit.py
~

Let's bring over the submission file and see what is happening there. We have one new option, which is gres. gres is a Slurm option to define extra computational resources. In our case this new resource would be our quantum resource. Since we specified the resources and the partition of our cluster where quantum resources are available, our Qiskit primitives will use these allocated resources to execute the quantum payload.

Now it's time to run our Slurm job.

$ sbatch hello_qiskit.sh

Then, let's check the status of our program using squeue command.

$ squeue
#             JOBID PARTITION     NAME         USER ST       TIME  NODES NODELIST(REASON)
#                 1 main          hello_qiskit  root R        0:01      1 q1

We can explore logs and results after the job is finished.

$ cat hello_qiskit.out | grep Exp
Expectation Value: 0.8372900070983516

Summary

So far, we've learned what computational resources are and how to use them to run programs in heterogeneous environments. We've also created and executed two simple 'Hello World' programs: one for the classic resource and the other for a quantum resource, and we've learned how to create shell scripts to submit tasks and view results.

In the following lesson, we will build on this knowledge of resource control to apply programming models to the resources we've acquired during job execution.

All the code and scripts used in this chapter are available for you within our GitHub repository.

Was this page helpful?

Report a bug, typo, or request content on GitHub.