Transpilation optimization with SABRE

Usage estimate: 1 minute on a Heron r2 processor (NOTE: This is an estimate only. Your runtime might vary.)

Learning outcomes

After going through this tutorial, you should understand:

How to configure SABRE parameters (layout_trials, swap_trials, max_iterations) to improve transpilation quality
The trade-offs between transpilation runtime and circuit quality (depth and gate count)
How to customize the SABRE routing heuristic (basic, decay, lookahead) and compare their performance on hardware

Prerequisites

We suggest that you are familiar with the following topics before going through this tutorial:

Transpile circuits: overview of transpilation in Qiskit
Transpiler stages: layout and routing stages
Configure preset pass managers: customizing optimization levels

Background

Transpilation converts quantum circuits into forms compatible with specific quantum hardware. Two key stages are choosing a qubit layout (mapping logical qubits to physical qubits) and gate routing (inserting SWAP gates so multi-qubit gates respect device connectivity).

SABRE (SWAP-Based Bidirectional heuristic search algorithm) optimizes both layout and routing. It is especially effective for large-scale circuits (100+ qubits) on devices with complex coupling maps, like IBM® Heron processors. SABRE minimizes SWAP gates and reduces circuit depth, improving execution fidelity. Recent improvements in the LightSABRE algorithm further reduce runtimes and gate counts.

In this tutorial, you will first configure SabreLayout with different parameters to optimize a small GHZ circuit and observe the impact on execution fidelity. Then, you will compare SABRE's routing heuristics at scale on real hardware.

Requirements

Before starting this tutorial, be sure you have the following installed:

Qiskit SDK v2.0 or later, with visualization support
Qiskit Runtime v0.22 or later (pip install qiskit-ibm-runtime)
Qiskit Aer (pip install qiskit-aer)

Setup

from qiskit import QuantumCircuit
from qiskit.quantum_info import SparsePauliOp
from qiskit_ibm_runtime import QiskitRuntimeService
from qiskit_ibm_runtime import EstimatorOptions
from qiskit_ibm_runtime import EstimatorV2 as Estimator
from qiskit_aer.primitives import EstimatorV2 as AerEstimator
from qiskit.transpiler.passes import (
    SabreLayout,
    SabreSwap,
    BarrierBeforeFinalMeasurements,
    StarPreRouting,
)
from qiskit.transpiler.passes.layout.vf2_layout import VF2LayoutStopReason
from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager
from qiskit.passmanager.flow_controllers import ConditionalController
import matplotlib.pyplot as plt
import numpy as np
import time

seed = 42

service = QiskitRuntimeService(
    channel="ibm_cloud",
    token="<YOUR_API_TOKEN>",  # Replace with your actual API token
    instance="<YOUR_INSTANCE_NAME>",  # Replace with your instance name if needed
)
backend = service.least_busy(operational=True, simulator=False)


print(f"Using backend: {backend.name}")

Output:

Using backend: ibm_kingston

Small-scale simulator example

In this section, a noisy simulator based on the real backend's noise model is used to demonstrate how different SabreLayout configurations affect both transpilation quality and execution fidelity. Using qiskit_aer with a noise model derived from actual hardware calibration data allows you to test the transpilation without consuming hardware credits.

Step 1: Map classical inputs to a quantum problem

We construct a star-topology GHZ circuit with 15 qubits. The first qubit is the hub, with CNOT gates connecting it directly to every other qubit. This topology creates a challenging layout problem because it does not map trivially to the device's coupling map.

We also define ZZ operators to measure entanglement correlations $\langle Z_0 Z_i \rangle$ across qubit pairs.

When you know the circuit structure

SABRE is a general-purpose algorithm and makes no assumptions about circuit structure. For this star-topology GHZ circuit, an optimal routing is actually known: the StarPreRouting pass detects star sub-circuits and rewrites them into a linear chain that maps directly onto any backend with a long enough linear path. This tutorial focuses on SABRE because it works for arbitrary circuits, but if you know your circuit has a clear special structure, applying a specialized pass like StarPreRouting before routing can outperform any heuristic search.

num_qubits_sim = 15

# Create star-topology GHZ circuit
qc_sim = QuantumCircuit(num_qubits_sim)
qc_sim.h(0)
for i in range(1, num_qubits_sim):
    qc_sim.cx(0, i)
qc_sim.measure_all()

# ZZ operators: Z on qubit 0 and qubit i, identity elsewhere
operator_strings_sim = [
    "Z" + "I" * i + "Z" + "I" * (num_qubits_sim - 2 - i)
    for i in range(num_qubits_sim - 1)
]
operators_sim = [SparsePauliOp(op) for op in operator_strings_sim]

Step 2: Optimize problem for quantum hardware execution

The default optimization_level=3 preset pass manager already uses SabreLayout, but with conservative defaults. To explore the impact of stronger settings, that pass is replaced with a custom SabreLayout configured for more aggressive search, while every other pass in the layout stage is left untouched. As a separate point of comparison, a fourth pass manager keeps the default SabreLayout but adds StarPreRouting to the init stage. StarPreRouting is a structure-aware pass that detects star sub-circuits and rewrites them into a linear chain before routing.

The workflow is:

Inspect the default pass manager to see where SabreLayout sits inside the layout stage.
Replace that pass with a custom SabreLayout instance using PassManager.replace(index, passes=...), and build the pm_star variant with pm.init += StarPreRouting().
Run all four pass managers and compare metrics.

The four configurations are:

Config	Description
`pm_1` (default)	Default level-3 preset (`SabreLayout` with `max_iterations=4`, `layout_trials=20`, `swap_trials=20`)
`pm_2`	Custom `SabreLayout` (`max_iterations=4`, `layout_trials=200`, `swap_trials=200`)
`pm_3`	Custom `SabreLayout` (`max_iterations=8`, `layout_trials=200`, `swap_trials=200`)
`pm_star`	Default preset with `StarPreRouting` added to the init stage

Key SABRE parameters:

layout_trials / swap_trials: Control how many candidate layouts and routing solutions SABRE explores. Increasing the number of trials means SABRE samples a wider search space, increasing the chance of finding a better solution.
max_iterations: Controls how many forward-backward routing refinement cycles SABRE performs on each candidate. SABRE iteratively improves the layout by learning from routing feedback, so the more iterations, the better the improvements.

Both come at the cost of longer transpilation time, but the resulting circuits are shorter and use fewer gates, which directly reduces decoherence and gate errors on real hardware.

Step 2a: Inspect the default pass manager. A StagedPassManager is composed of stages (init, layout, routing, translation, optimization, scheduling), each itself a PassManager. Calling .draw() on a stage renders its passes as a graph so we can see where SabreLayout lives.

# Build the default pass manager (no modifications yet)
pm_1 = generate_preset_pass_manager(
    optimization_level=3, backend=backend, seed_transpiler=seed
)

# Visualize the layout stage to see where SabreLayout sits
pm_1.layout.draw()

Output:

In the diagram above, the SabreLayout pass we want to customize lives inside the ConditionalController at position [2] of the layout stage. That controller does two things:

It gates SabreLayout so it only runs when VF2Layout at [1] failed to find a perfect mapping (otherwise the perfect VF2 layout is kept).
It precedes SabreLayout with a BarrierBeforeFinalMeasurements pass that protects measurements from being reordered during SabreLayout's internal routing.

If we just replace(index=2, passes=sl_2), both behaviors are dropped. To keep them, we re-wrap our custom SabreLayout in the same ConditionalController (with the same condition and the protective barrier) before swapping it in.

Step 2b: Build custom SabreLayout passes and replace the default.

cmap = backend.coupling_map

# Custom SabreLayout passes with more aggressive search
sl_2 = SabreLayout(
    coupling_map=cmap,
    seed=seed,
    max_iterations=4,
    layout_trials=200,
    swap_trials=200,
)
sl_3 = SabreLayout(
    coupling_map=cmap,
    seed=seed,
    max_iterations=8,
    layout_trials=200,
    swap_trials=200,
)


# Same condition the preset uses: only run SabreLayout when VF2Layout did not
# find a perfect mapping. This preserves any perfect layout VF2 produced at [1].
def _vf2_match_not_found(property_set):
    if property_set["layout"] is None:
        return True
    return (
        property_set["VF2Layout_stop_reason"] is not None
        and property_set["VF2Layout_stop_reason"]
        is not VF2LayoutStopReason.SOLUTION_FOUND
    )


def wrap_sabre(sabre_pass):
    """Re-wrap a SabreLayout in the original ConditionalController + barrier."""
    return ConditionalController(
        [
            BarrierBeforeFinalMeasurements(
                "qiskit.transpiler.internal.routing.protection.barrier"
            ),
            sabre_pass,
        ],
        condition=_vf2_match_not_found,
    )


# Build two fresh pass managers and swap in the wrapped custom SabreLayout at index 2
pm_2 = generate_preset_pass_manager(
    optimization_level=3, backend=backend, seed_transpiler=seed
)
pm_3 = generate_preset_pass_manager(
    optimization_level=3, backend=backend, seed_transpiler=seed
)
pm_2.layout.replace(index=2, passes=wrap_sabre(sl_2))
pm_3.layout.replace(index=2, passes=wrap_sabre(sl_3))

# Build pm_star: default preset with StarPreRouting added to the init stage
pm_star = generate_preset_pass_manager(
    optimization_level=3, backend=backend, seed_transpiler=seed
)
pm_star.init += StarPreRouting()

# Visualize pm_3 after replacement (pm_2 has the same structure, only max_iterations differs)
pm_3.layout.draw()

Output:

Position [2] is now a ConditionalController again — identical in shape to the default, but the inner SabreLayout is our custom one (with layout_trials=200, swap_trials=200, and max_iterations=8 for pm_3; pm_2 is identical apart from max_iterations=4). The protective barrier and the _vf2_match_not_found gating are preserved, so the only difference between pm_2/pm_3 and pm_1 is the SABRE configuration itself. pm_star keeps the default SabreLayout and only adds StarPreRouting at the end of the init stage.

Step 2c: Run each pass manager and compare.

results_sim = {}
for name, pm in [
    ("pm_1 (4,20,20)", pm_1),
    ("pm_2 (4,200,200)", pm_2),
    ("pm_3 (8,200,200)", pm_3),
    ("pm_star (default + StarPreRouting)", pm_star),
]:
    t0 = time.time()
    tqc = pm.run(qc_sim)
    elapsed = time.time() - t0
    depth = tqc.depth(lambda x: x.operation.num_qubits == 2)
    size = tqc.size()
    ops_mapped = [op.apply_layout(tqc.layout) for op in operators_sim]
    results_sim[name] = {
        "tqc": tqc,
        "ops": ops_mapped,
        "depth": depth,
        "size": size,
        "time": elapsed,
    }
    print(f"{name}: 2Q Depth {depth}, Size {size}, Time {elapsed:.2f}s")

# Print improvement relative to default (pm_1)
baseline = results_sim["pm_1 (4,20,20)"]
print("\nImprovement vs. default (pm_1):")
for name in [
    "pm_2 (4,200,200)",
    "pm_3 (8,200,200)",
    "pm_star (default + StarPreRouting)",
]:
    r = results_sim[name]
    depth_pct = (baseline["depth"] - r["depth"]) / baseline["depth"] * 100
    size_pct = (baseline["size"] - r["size"]) / baseline["size"] * 100
    print(f"  {name}: 2Q depth {depth_pct:+.1f}%, size {size_pct:+.1f}%")

Output:

pm_1 (4,20,20): 2Q Depth 38, Size 183, Time 0.01s
pm_2 (4,200,200): 2Q Depth 36, Size 183, Time 0.15s
pm_3 (8,200,200): 2Q Depth 30, Size 158, Time 0.16s
pm_star (default + StarPreRouting): 2Q Depth 26, Size 160, Time 0.01s

Improvement vs. default (pm_1):
  pm_2 (4,200,200): 2Q depth +5.3%, size +0.0%
  pm_3 (8,200,200): 2Q depth +21.1%, size +13.7%
  pm_star (default + StarPreRouting): 2Q depth +31.6%, size +12.6%

All three modified pass managers produced circuits with lower 2Q depth than the default. The aggressive SABRE configurations (pm_2 and pm_3) trade longer transpilation time for a wider search, while pm_star leverages the star structure of the circuit and produces an even shallower result without paying any extra transpilation cost. The exact gains will vary from run to run, but the general trend is consistent: more SABRE trials and iterations let the heuristic search a wider space, and structure-aware passes like StarPreRouting can sidestep that search entirely when the circuit shape matches.

Even at this small scale (15 qubits), the room for improvement is enough that all three approaches beat the default. With larger circuits (100+ qubits), the search space grows dramatically and the benefits of both increased trials and structure-aware passes become much more pronounced, as the large-scale section will show.

pm_names = list(results_sim.keys())
depths = [results_sim[n]["depth"] for n in pm_names]
sizes = [results_sim[n]["size"] for n in pm_names]
times = [results_sim[n]["time"] for n in pm_names]
colors = ["#404080", "#2a9d8f", "#a8d05e", "#e29bdd"]
x = np.arange(len(pm_names))

fig, axs = plt.subplots(1, 3, figsize=(14, 5))

# 2Q Depth
bars = axs[0].bar(x, depths, color=colors)
axs[0].set_ylabel("2Q Depth", fontsize=11)
axs[0].set_title("Two-Qubit Gate Depth", fontsize=13)
axs[0].set_ylim(0, max(depths) * 1.2)
for bar, val in zip(bars, depths):
    axs[0].text(
        bar.get_x() + bar.get_width() / 2,
        bar.get_height() + max(depths) * 0.02,
        str(val),
        ha="center",
        va="bottom",
        fontsize=11,
        fontweight="bold",
    )
for i in range(1, len(depths)):
    pct = (depths[0] - depths[i]) / depths[0] * 100
    if pct != 0:
        axs[0].text(
            bars[i].get_x() + bars[i].get_width() / 2,
            bars[i].get_height() / 2,
            f"{pct:+.0f}%",
            ha="center",
            va="center",
            fontsize=10,
            color="white",
            fontweight="bold",
        )

# Size
bars = axs[1].bar(x, sizes, color=colors)
axs[1].set_ylabel("Gate Count", fontsize=11)
axs[1].set_title("Circuit Size", fontsize=13)
axs[1].set_ylim(0, max(sizes) * 1.2)
for bar, val in zip(bars, sizes):
    axs[1].text(
        bar.get_x() + bar.get_width() / 2,
        bar.get_height() + max(sizes) * 0.02,
        str(val),
        ha="center",
        va="bottom",
        fontsize=11,
        fontweight="bold",
    )
for i in range(1, len(sizes)):
    pct = (sizes[0] - sizes[i]) / sizes[0] * 100
    if abs(pct) > 0.1:
        axs[1].text(
            bars[i].get_x() + bars[i].get_width() / 2,
            bars[i].get_height() / 2,
            f"{pct:+.0f}%",
            ha="center",
            va="center",
            fontsize=10,
            color="white",
            fontweight="bold",
        )

# Time
bars = axs[2].bar(x, times, color=colors)
axs[2].set_ylabel("Time (s)", fontsize=11)
axs[2].set_title("Transpilation Time", fontsize=13)
axs[2].set_ylim(0, max(times) * 1.3)
for bar, val in zip(bars, times):
    axs[2].text(
        bar.get_x() + bar.get_width() / 2,
        bar.get_height() + max(times) * 0.03,
        f"{val:.2f}s",
        ha="center",
        va="bottom",
        fontsize=11,
        fontweight="bold",
    )

for ax in axs:
    ax.set_xticks(x)
    ax.set_xticklabels(pm_names, fontsize=8, rotation=15)
    ax.grid(axis="y", linestyle="--", alpha=0.5)

plt.suptitle(
    "Transpilation quality vs. configuration",
    fontsize=14,
    fontweight="bold",
    y=1.02,
)
plt.tight_layout()
plt.show()

Output:

Step 3: Execute using Qiskit primitives

We run each transpiled circuit 10 times using the Aer EstimatorV2 with a noise model derived from the real backend. Since noisy simulation results vary between runs, averaging over multiple runs gives more reliable fidelity estimates and lets us quantify the statistical uncertainty with error bars.

# Create a noisy estimator from the real backend's noise model
noisy_estimator = AerEstimator.from_backend(backend)

num_runs = 10
# sim_all_runs[name] = list of arrays, one per run
sim_all_runs = {name: [] for name in results_sim}

for run in range(num_runs):
    for name, r in results_sim.items():
        job = noisy_estimator.run([(r["tqc"], r["ops"])])
        evs = list(job.result()[0].data.evs)
        sim_all_runs[name].append(evs)
    print(f"Run {run + 1}/{num_runs} done")

# Compute mean and std across runs for each config
sim_stats = {}
for name in results_sim:
    all_evs = np.array(sim_all_runs[name])  # shape (num_runs, num_operators)
    sim_stats[name] = {
        "mean": np.mean(all_evs, axis=0),
        "std": np.std(all_evs, axis=0),
        "overall_mean": np.mean(all_evs),
        "overall_std": np.std(
            np.mean(all_evs, axis=1)
        ),  # std of per-run averages
    }
    print(
        f"{name}: mean fidelity = {sim_stats[name]['overall_mean']:.4f} +/- {sim_stats[name]['overall_std']:.4f}"
    )

Output:

Run 1/10 done
Run 2/10 done
Run 3/10 done
Run 4/10 done
Run 5/10 done
Run 6/10 done
Run 7/10 done
Run 8/10 done
Run 9/10 done
Run 10/10 done
pm_1 (4,20,20): mean fidelity = 0.9510 +/- 0.0094
pm_2 (4,200,200): mean fidelity = 0.9513 +/- 0.0043
pm_3 (8,200,200): mean fidelity = 0.9540 +/- 0.0065
pm_star (default + StarPreRouting): mean fidelity = 0.9547 +/- 0.0072

Because this is a small circuit, the fidelity values land relatively close across all four configurations. The circuits are short enough that hardware noise does not heavily penalize even the least optimized version. Mean fidelity broadly tracks 2Q depth: pm_3 and pm_star, the two shallowest circuits, achieve the highest fidelities and are essentially tied within their error bars. pm_2 is a useful counter-example: although its 2Q depth is lower than pm_1's, its mean fidelity ends up marginally lower as well, which is a reminder that the depth-to-fidelity link is statistical rather than deterministic. The specific qubits a layout selects and the calibration of those qubits at run time also matter.

Step 4: Post-process and return result in desired classical format

Next, plot the entanglement correlations $\langle Z_0 Z_i \rangle$ as a function of qubit distance, along with the mean correlation as a single fidelity metric. In an ideal (noiseless) case, all correlations would be 1. With realistic noise, each additional gate introduces error and each additional time step allows decoherence, so a transpiled circuit with lower depth and fewer gates (especially two-qubit gates) should preserve entanglement better.

data_sim = list(range(1, len(operators_sim) + 1))
markers = ["o", "s", "^", "*"]
colors_line = ["#404080", "#2a9d8f", "#a8d05e", "#e29bdd"]

fig, (ax1, ax2) = plt.subplots(
    1, 2, figsize=(14, 5), gridspec_kw={"width_ratios": [2.5, 1]}
)

# Left: correlations vs distance with error bars (mean +/- 1 std)
for (name, stats), marker, color in zip(
    sim_stats.items(), markers, colors_line
):
    ax1.errorbar(
        data_sim,
        stats["mean"],
        yerr=stats["std"],
        marker=marker,
        label=name,
        color=color,
        linewidth=2,
        capsize=3,
        capthick=1,
        elinewidth=1,
    )

ax1.set_xlabel("Distance between qubits $i$", fontsize=11)
ax1.set_ylabel(r"$\langle Z_0 Z_i \rangle$", fontsize=11)
ax1.set_title(
    "Entanglement correlations vs. qubit distance (avg. of 10 runs)",
    fontsize=12,
)
ax1.legend(fontsize=9)
ax1.grid(alpha=0.3)

# Right: mean correlation bar chart with error bars
names = list(sim_stats.keys())
means = [sim_stats[n]["overall_mean"] for n in names]
stds = [sim_stats[n]["overall_std"] for n in names]
x_bar = np.arange(len(names))
bars = ax2.bar(
    x_bar, means, yerr=stds, color=colors_line, capsize=5, ecolor="gray"
)
ax2.set_ylabel(r"Mean $\langle Z_0 Z_i \rangle$", fontsize=11)
ax2.set_title("Average fidelity", fontsize=13, pad=12)
y_range = max(means) - min(means) if max(means) != min(means) else 0.01
# Top of ylim accounts for the bar height + std error bar + headroom for the value label
y_top = max(m + s for m, s in zip(means, stds)) + y_range * 1.5
ax2.set_ylim(min(means) - y_range * 0.8, y_top)
for bar, val, std in zip(bars, means, stds):
    ax2.text(
        bar.get_x() + bar.get_width() / 2,
        bar.get_height() + std + y_range * 0.15,
        f"{val:.4f}",
        ha="center",
        va="bottom",
        fontsize=10,
        fontweight="bold",
    )
# Annotate % change vs pm_1
baseline_mean = means[0]
for i in range(1, len(means)):
    pct = (means[i] - baseline_mean) / baseline_mean * 100
    if abs(pct) > 0.01:
        mid_y = (means[i] + ax2.get_ylim()[0]) / 2
        ax2.text(
            bars[i].get_x() + bars[i].get_width() / 2,
            mid_y,
            f"{pct:+.1f}%",
            ha="center",
            va="center",
            fontsize=10,
            color="white",
            fontweight="bold",
        )
ax2.set_xticks(x_bar)
ax2.set_xticklabels(names, fontsize=8, rotation=15)
ax2.grid(axis="y", linestyle="--", alpha=0.5)

fig.tight_layout()
plt.show()

Output:

The results show a clear connection between transpilation quality and execution fidelity, with a few useful caveats:

pm_1 (default): Baseline. With only 20 trials and four iterations, SABRE has limited room to optimize, resulting in the deepest of the SABRE-only circuits.
pm_2 (more trials): Exploring ten times more candidates finds a slightly shallower layout, but mean fidelity is roughly flat (and can even dip below the baseline within noise) because the depth gain is small at this scale.
pm_3 (more trials + more iterations): Doubling max_iterations to 8 gives SABRE more refinement cycles, producing the shallowest SABRE-only circuit and the highest mean fidelity in the comparison.
pm_star (default + StarPreRouting): Adds StarPreRouting to the init stage of an otherwise default preset. The structure-aware rewrite collapses the star into a linear chain that the rest of the transpiler maps onto the device's linear path, producing the shallowest circuit overall (slightly better than pm_3) and matching pm_3 on fidelity within error bars. It does this with the same transpilation time as the default, since the rewrite is essentially free compared to SABRE's stochastic search.

Note that increasing max_iterations does not always have a positive impact. In this case it helped significantly, but for other circuits or backends the additional iterations may not yield further improvement, or may even slightly hurt performance due to over-optimization of a local minimum. In general, you should increase layout_trials and swap_trials as much as your time budget allows, since more trials always increases the chance of finding a better layout. Increasing max_iterations is worth testing but should be validated for your specific use case. Specialized passes like StarPreRouting are similar in spirit but more circuit-dependent: they only help when the circuit actually contains the structure they target. The gain is large when applicable and zero otherwise, but they cost essentially nothing to try.

Large-scale hardware example

In addition to adjusting trial numbers, SABRE supports customizing the routing heuristic. SABRE offers three heuristics:

basic: A simple greedy approach that selects the swap minimizing the immediate distance to the next gate.
decay (default): Dynamically weights qubits based on recent activity, discouraging repeated swaps on the same qubits.
lookahead: Evaluates future routing costs by looking ahead at upcoming gates, potentially finding better swap sequences.

To use a custom heuristic, create a SabreSwap pass and connect it to SabreLayout via the routing_pass parameter.

A fourth pass manager is added to the comparison: pm_star_hw, which keeps the default SabreLayout/SabreSwap settings but adds StarPreRouting to the init stage. At this scale (100 qubits) the SABRE search is harder, and the rewrite from a star into a linear chain becomes a clear win because a Heron processor has linear paths long enough to host the resulting circuit.

Here we compare all three SABRE heuristics plus StarPreRouting at scale on a 100-qubit GHZ circuit. We run multiple layout trials with different seeds for the SABRE configurations, select the best transpiled circuit from each, and submit them all to real hardware alongside the StarPreRouting result.

Steps 1-4 compressed into a single code block

Here the full workflow is put together at a larger scale. When using SabreSwap as the routing_pass for SabreLayout, only one layout trial is performed per call, so the following code cell loops over seeds to explore the layout space.

We use the same wrap_sabre helper defined in the small-scale Step 2 (above), and add an analogous wrap_routing helper because the routing stage at index [1] is also a ConditionalController([BarrierBeforeFinalMeasurements, routing_pass], ...) — replacing it bare would similarly drop the protective barrier and the _swap_condition gating.

# -------------------------Step 1-------------------------

num_qubits = 100

# Create star-topology GHZ circuit
qc = QuantumCircuit(num_qubits)
qc.h(0)
for i in range(1, num_qubits):
    qc.cx(0, i)
qc.measure_all()

# ZZ operators
operator_strings = [
    "Z" + "I" * i + "Z" + "I" * (num_qubits - 2 - i)
    for i in range(num_qubits - 1)
]
operators = [SparsePauliOp(op) for op in operator_strings]

# -------------------------Step 2-------------------------

num_seeds = 10
seed_list = [seed + i for i in range(num_seeds)]
swap_trials = 200


# The default routing[1] is a ConditionalController([barrier, routing_pass],
# condition=_swap_condition); we re-wrap so the new routing pass keeps the
# protective barrier and is skipped when routing isn't needed (matches the preset).
def _swap_condition(property_set):
    return not property_set["routing_not_needed"]


def wrap_routing(routing_pass):
    return ConditionalController(
        [
            BarrierBeforeFinalMeasurements(
                "qiskit.transpiler.internal.routing.protection.barrier"
            ),
            routing_pass,
        ],
        condition=_swap_condition,
    )


heuristic_results = {}

# Three SABRE heuristics, swept over seeds
for heuristic in ["basic", "decay", "lookahead"]:
    trials = []
    for s in seed_list:
        sr = SabreSwap(
            coupling_map=cmap, heuristic=heuristic, trials=swap_trials, seed=s
        )
        sl = SabreLayout(coupling_map=cmap, routing_pass=sr, seed=s)
        pm = generate_preset_pass_manager(
            optimization_level=3, backend=backend, seed_transpiler=s
        )
        # Re-wrap each custom pass in its original ConditionalController + barrier
        # (wrap_sabre is defined in the small-scale Step 2 cell above).
        pm.layout.replace(index=2, passes=wrap_sabre(sl))
        pm.routing.replace(index=1, passes=wrap_routing(sr))

        t0 = time.time()
        tqc = pm.run(qc)
        elapsed = time.time() - t0
        depth = tqc.depth(lambda x: x.operation.num_qubits == 2)
        size = tqc.size()
        trials.append(
            {
                "tqc": tqc,
                "depth": depth,
                "size": size,
                "time": elapsed,
                "seed": s,
            }
        )

    heuristic_results[heuristic] = trials

# Default preset + StarPreRouting in init, also swept over seeds for a fair comparison
star_trials = []
for s in seed_list:
    pm_star_hw = generate_preset_pass_manager(
        optimization_level=3, backend=backend, seed_transpiler=s
    )
    pm_star_hw.init += StarPreRouting()

    t0 = time.time()
    tqc = pm_star_hw.run(qc)
    elapsed = time.time() - t0
    depth = tqc.depth(lambda x: x.operation.num_qubits == 2)
    size = tqc.size()
    star_trials.append(
        {
            "tqc": tqc,
            "depth": depth,
            "size": size,
            "time": elapsed,
            "seed": s,
        }
    )
heuristic_results["StarPreRouting"] = star_trials

# Print summary for each entry
for label in ["basic", "decay", "lookahead", "StarPreRouting"]:
    trials = heuristic_results[label]
    depths = [t["depth"] for t in trials]
    sizes = [t["size"] for t in trials]
    best = min(trials, key=lambda t: t["depth"])
    print(f"{label}:")
    print(
        f"  2Q depth: min: {min(depths)}, mean: {np.mean(depths):.1f}, std: {np.std(depths):.1f}"
    )
    print(
        f"  size    : min: {min(sizes)}, mean: {np.mean(sizes):.1f}, std: {np.std(sizes):.1f}"
    )
    print(
        f"  best seed: {best['seed']} (2Q depth={best['depth']}, size={best['size']})"
    )

Output:

basic:
  2Q depth: min: 524, mean: 570.5, std: 39.9
  size    : min: 3819, mean: 4227.1, std: 360.6
  best seed: 51 (2Q depth=524, size=3852)
decay:
  2Q depth: min: 387, mean: 436.4, std: 41.7
  size    : min: 2687, mean: 3183.1, std: 459.3
  best seed: 45 (2Q depth=387, size=2786)
lookahead:
  2Q depth: min: 364, mean: 424.6, std: 36.5
  size    : min: 2335, mean: 3014.6, std: 388.1
  best seed: 51 (2Q depth=364, size=2485)
StarPreRouting:
  2Q depth: min: 196, mean: 196.0, std: 0.0
  size    : min: 1151, mean: 1151.0, std: 0.0
  best seed: 42 (2Q depth=196, size=1151)

hw_colors = {
    "basic": "#ff7f0e",
    "decay": "#d62728",
    "lookahead": "#1f77b4",
    "StarPreRouting": "#2a9d8f",
}

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))

for label in ["basic", "decay", "lookahead", "StarPreRouting"]:
    trials = heuristic_results[label]
    depths = [t["depth"] for t in trials]
    sizes = [t["size"] for t in trials]
    seeds = [t["seed"] for t in trials]
    color = hw_colors[label]

    ax1.scatter(
        seeds,
        depths,
        label=label,
        color=color,
        alpha=0.8,
        edgecolor="k",
        s=60,
    )
    ax1.axhline(np.mean(depths), color=color, linestyle="--", alpha=0.5)

    ax2.scatter(
        seeds,
        sizes,
        label=label,
        color=color,
        alpha=0.8,
        edgecolor="k",
        s=60,
    )
    ax2.axhline(np.mean(sizes), color=color, linestyle="--", alpha=0.5)

ax1.set_xlabel("Seed", fontsize=11)
ax1.set_ylabel("2Q Depth", fontsize=11)
ax1.set_title("Two-Qubit Gate Depth per Seed", fontsize=13)
ax1.legend(fontsize=10)
ax1.grid(alpha=0.3)

ax2.set_xlabel("Seed", fontsize=11)
ax2.set_ylabel("Gate Count", fontsize=11)
ax2.set_title("Circuit Size per Seed", fontsize=13)
ax2.legend(fontsize=10)
ax2.grid(alpha=0.3)

plt.suptitle(
    "Transpilation variability across seeds: SABRE heuristics vs. StarPreRouting",
    fontsize=14,
    fontweight="bold",
    y=1.02,
)
plt.tight_layout()
plt.show()

# Summary comparison
for label in ["basic", "decay", "lookahead", "StarPreRouting"]:
    best = min(heuristic_results[label], key=lambda t: t["depth"])
    print(
        f"{label}: best 2Q depth={best['depth']}, size={best['size']} (seed={best['seed']})"
    )

Output:

basic: best 2Q depth=524, size=3852 (seed=51)
decay: best 2Q depth=387, size=2786 (seed=45)
lookahead: best 2Q depth=364, size=2485 (seed=51)
StarPreRouting: best 2Q depth=196, size=1151 (seed=42)

# -------------------------Step 3: Execute on hardware-------------------------

best_circuits = {}
for label in ["basic", "decay", "lookahead", "StarPreRouting"]:
    best_circuits[label] = min(
        heuristic_results[label], key=lambda t: t["depth"]
    )
    b = best_circuits[label]
    print(f"Best {label}: 2Q depth={b['depth']}, size={b['size']}")

options = EstimatorOptions()
options.resilience_level = 2
options.dynamical_decoupling.enable = True
options.dynamical_decoupling.sequence_type = "XY4"
estimator = Estimator(backend, options=options)

hw_jobs = {}
hw_ops = {}
for label, best in best_circuits.items():
    hw_ops[label] = [op.apply_layout(best["tqc"].layout) for op in operators]
    hw_jobs[label] = estimator.run([(best["tqc"], hw_ops[label])])
    print(f"{label} job: {hw_jobs[label].job_id()}")
estimator.options.environment.job_tags = ["TUT_TOWS"]

hw_results = {}
for label, job in hw_jobs.items():
    hw_results[label] = job.result()[0]
    print(f"{label} job done")

Output:

Best basic: 2Q depth=524, size=3852
Best decay: 2Q depth=387, size=2786
Best lookahead: 2Q depth=364, size=2485
Best StarPreRouting: 2Q depth=196, size=1151
basic job: d81q5tnoha1c73bknprg
decay job: d81q5tugbeec73aktopg
lookahead job: d81q5to0bvlc73d1epe0
StarPreRouting job: d81q5u7tjchs73bn82hg
basic job done
decay job done
lookahead job done
StarPreRouting job done

# -------------------------Step 4: Post-process-------------------------

data = list(range(1, len(operators) + 1))
hw_markers = {
    "basic": "D",
    "decay": "o",
    "lookahead": "s",
    "StarPreRouting": "*",
}
hw_labels = ["basic", "decay", "lookahead", "StarPreRouting"]

fig, (ax1, ax2) = plt.subplots(
    1, 2, figsize=(14, 5), gridspec_kw={"width_ratios": [2.5, 1]}
)

# Left: correlations vs distance
for label in hw_labels:
    evs = list(hw_results[label].data.evs)
    b = best_circuits[label]
    ax1.plot(
        data,
        evs,
        marker=hw_markers[label],
        color=hw_colors[label],
        linewidth=2,
        label=f"{label} (2Q depth={b['depth']}, size={b['size']})",
        markersize=5 if label == "StarPreRouting" else 4,
    )

ax1.set_xlabel("Distance between qubits $i$", fontsize=11)
ax1.set_ylabel(r"$\langle Z_0 Z_i \rangle$", fontsize=11)
ax1.set_title(
    "Entanglement correlations vs. qubit distance (hardware)", fontsize=12
)
ax1.legend(fontsize=9)
ax1.grid(alpha=0.3)

# Right: mean fidelity bar chart
hw_means = [np.mean(list(hw_results[label].data.evs)) for label in hw_labels]
hw_bar_colors = [hw_colors[label] for label in hw_labels]
x_bar = np.arange(len(hw_labels))
bars = ax2.bar(x_bar, hw_means, color=hw_bar_colors)
ax2.set_ylabel(r"Mean $\langle Z_0 Z_i \rangle$", fontsize=11)
ax2.set_title("Average fidelity", fontsize=13)
y_range = (
    max(hw_means) - min(hw_means) if max(hw_means) != min(hw_means) else 0.01
)
ax2.set_ylim(min(hw_means) - y_range * 0.2, max(hw_means) + y_range * 0.15)
for bar, val in zip(bars, hw_means):
    ax2.text(
        bar.get_x() + bar.get_width() / 2,
        bar.get_height() + y_range * 0.05,
        f"{val:.4f}",
        ha="center",
        va="bottom",
        fontsize=11,
        fontweight="bold",
    )
ax2.set_xticks(x_bar)
ax2.set_xticklabels(hw_labels, fontsize=9, rotation=15)
ax2.grid(axis="y", linestyle="--", alpha=0.5)

fig.tight_layout()
plt.show()

print("\nMean fidelity:")
for label, m in zip(hw_labels, hw_means):
    print(f"  {label}: {m:.4f}")

Output:


Mean fidelity:
  basic: 0.0344
  decay: 0.1298
  lookahead: 0.1857
  StarPreRouting: 0.3295

Analysis

The scatter plots show significant variability across seeds for all three SABRE heuristics, which underscores the importance of running multiple layout trials rather than relying on a single transpilation. The StarPreRouting line is essentially flat across seeds because the rewrite from a star into a linear chain is deterministic given the structure; the downstream SABRE routing then has very little freedom on a linear chain, so the seed has almost no effect on the final depth or size.

From the transpilation results, both the decay and lookahead heuristics consistently outperform basic by a wide margin. The basic heuristic, while fast, uses a simple greedy strategy that often leads to substantially deeper circuits. For this star-topology GHZ circuit, lookahead tends to produce the lowest 2Q depth and gate count among the SABRE heuristics, since its forward-looking cost function is well suited to circuits with long-range connectivity patterns. StarPreRouting, however, dwarfs all three by a substantial margin: by rewriting the star into a linear chain before routing, it short-circuits the search problem entirely and delivers a circuit that the rest of the transpiler can map onto a linear path with minimal additional SWAPs.

That advantage carries straight over to hardware fidelity. Lower 2Q depth and gate count do not always translate one-for-one to higher fidelity (the specific physical qubits a layout uses and their calibration at run time also matter), but when the depth gap is as large as the one between SABRE and StarPreRouting here, the structure-aware approach wins decisively because the circuit accumulates far less decoherence and far fewer two-qubit error events. The fidelity bar chart shows StarPreRouting substantially ahead of even the best SABRE heuristic, while basic sits well below the rest because its much deeper circuits accumulate the most error.

Key takeaways:

Among SABRE heuristics, decay and lookahead are substantially better than basic for non-trivial circuits. Prefer one of the two for production workloads.
The best SABRE heuristic depends on your circuit and hardware. Testing multiple heuristics with multiple seeds is the most reliable strategy.
If you want to explore even more layouts, increase swap_trials (and layout_trials when you are not pinning a custom routing pass) rather than fanning the work out to remote nodes. The SABRE passes already parallelize trials across local threads, and the per-trial work is small enough that distribution overhead typically dominates any speedup.
When the circuit has a known special structure, applying a structure-aware pass like StarPreRouting before SABRE can deliver an order-of-magnitude improvement that no amount of SABRE tuning will match. This is not a replacement for SABRE: StarPreRouting only helps when the circuit actually contains star sub-circuits and the backend has a long enough linear path. It is worth checking the pass library for matches whenever you know your circuit's shape.

Next steps

If you found this work interesting, you might be interested in the following material:

Recommendations

SabreLayout API reference: full parameter documentation
SABRE paper: the original SABRE algorithm for layout and routing
LightSABRE paper: the algorithmic improvements that power Qiskit's current SABRE implementation
Write a custom transpiler pass: build your own transpilation logic
Transpiler plugins: extend Qiskit's transpilation pipeline with third-party passes
DAG representation: understand the directed acyclic graph used internally by the transpiler

Tutorial survey

Please take this short survey to provide feedback on this tutorial. Your insights will help us improve our content offerings and user experience.

Link to survey

Was this page helpful?

Report a bug, typo, or request content on GitHub.