Quantum circuit optimization
Toshinari Itoko (21 June 2024)
Download the pdf of the original lecture. Note that some code snippets might become deprecated since these are static images.
Approximate QPU time to run this experiment is 15 s.
(Note: Some cells of part 2 are copied from the notebook "Qiskit Deep dive", written by Matthew Treinish (Qiskit maintainer))
# !pip install 'qiskit[visualization]'
# !pip install qiskit_ibm_runtime qiskit_aer
# !pip install jupyter
# !pip install matplotlib pylatexenc pydot pillow
import qiskit
qiskit.__version__
Output:
'2.0.2'
import qiskit_ibm_runtime
qiskit_ibm_runtime.__version__
Output:
'0.40.1'
import qiskit_aer
qiskit_aer.__version__
Output:
'0.17.1'
1. Introduction
This lesson will address several aspects of circuit optimization in quantum computing. Specifically, we will see the value of circuit optimization by using optimization settings built into Qiskit. Then we will go a bit deeper and see what you can do as an expert in your particular application area to build circuits in a smart way. Finally, we will take a close look at what goes on during transpilation that helps us optimize our circuits.
2. Circuit optimization matters
We first compare the results of running 5-qubit GHZ state () preparation circuits with and without optimization.
from qiskit.circuit import QuantumCircuit
from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager
from qiskit.primitives import BackendSamplerV2 as Sampler
from qiskit_ibm_runtime.fake_provider import FakeKyiv
backend = FakeKyiv()
We first use a GHZ circuit naively synthesized as follows.
num_qubits = 5
ghz_circ = QuantumCircuit(num_qubits)
ghz_circ.h(0)
[ghz_circ.cx(0, i) for i in range(1, num_qubits)]
ghz_circ.measure_all()
ghz_circ.draw("mpl")
Output:

2.1 Optimization level
There are 4 available optimization_level
s from 0-3. The higher the optimization level the more computational effort is spent to optimize the circuit. Level 0 performs no optimization and just does the minimal amount of work to make the circuit runnable on the selected backend. Level 3 spends the most amount if effort (and typically runtime) to try to optimize the circuit. Level 1 is the default optimization level.
We transpile the circuit without optimization (optimization_level=0
) and with optimization (optimization_level=2
).
We see a big difference in the circuit length of transpiled circuits.
pm0 = generate_preset_pass_manager(
optimization_level=0, backend=backend, seed_transpiler=777
)
pm2 = generate_preset_pass_manager(
optimization_level=2, backend=backend, seed_transpiler=777
)
circ0 = pm0.run(ghz_circ)
circ2 = pm2.run(ghz_circ)
print("optimization_level=0:")
display(circ0.draw("mpl", idle_wires=False, fold=-1))
print("optimization_level=2:")
display(circ2.draw("mpl", idle_wires=False, fold=-1))
Output:
optimization_level=0:

optimization_level=2:

2.2 Exercise
Try optimization_level=1
as well and compare the resulting circuit with the above two. Try it by modifying the code above.
Solution:
pm1 = generate_preset_pass_manager(
optimization_level=1, backend=backend, seed_transpiler=777
)
circ1 = pm1.run(ghz_circ)
print("optimization_level=1:")
display(circ1.draw("mpl", idle_wires=False, fold=-1))
Output:
optimization_level=1:

Run on a fake backend (noisy simulation). See Appendix 1 for how to run on a real backend.
# run the circuits on the fake backend (noisy simulator)
sampler = Sampler(backend=backend)
job = sampler.run([circ0, circ2], shots=10000)
print(f"Job ID: {job.job_id()}")
Output:
Job ID: 93a4ac70-e3ea-44ad-aea9-5045840c9076
# get results
result = job.result()
unoptimized_result = result[0].data.meas.get_counts()
optimized_result = result[1].data.meas.get_counts()
from qiskit.visualization import plot_histogram
# plot
sim_result = {"0" * 5: 0.5, "1" * 5: 0.5}
plot_histogram(
[result for result in [sim_result, unoptimized_result, optimized_result]],
bar_labels=False,
legend=[
"ideal",
"no optimization",
"with optimization",
],
)
Output:

3. Circuit synthesis matters
We next compare the results of running two differently synthesized 5-qubit GHZ state () preparation circuits.
# Original GHZ circuit (naive synthesis)
ghz_circ.draw("mpl")
Output:

# A cleverly-synthesized GHZ circuit
ghz_circ2 = QuantumCircuit(5)
ghz_circ2.h(2)
ghz_circ2.cx(2, 1)
ghz_circ2.cx(2, 3)
ghz_circ2.cx(1, 0)
ghz_circ2.cx(3, 4)
ghz_circ2.measure_all()
ghz_circ2.draw("mpl")
Output:

# transpile both with the same optimization level 2
circ_org = pm2.run(ghz_circ)
circ_new = pm2.run(ghz_circ2)
print("original synthesis:")
display(circ_org.draw("mpl", idle_wires=False, fold=-1))
print("new synthesis:")
display(circ_new.draw("mpl", idle_wires=False, fold=-1))
Output:
original synthesis:

new synthesis:

The new synthesis produces a shallower circuit. Why?
This is because the new circuit can be laid out on linearly connected qubits, so on IBM Kyiv's heavy-hexagon coupling graph as well, while the original circuit requires star-shaped connectivity (a degree-4 node) and hence cannot be laid out on the heavy-hex coupling graph, which has nodes at most degree 3. As a result, the original circuit requires qubit routing that adds SWAP gates, increasing the gate count.
What we have done in the new circuit can be seen as a manual "coupling constraint-aware" circuit synthesis. In other words: manually solving circuit synthesis and circuit mapping at the same time.
# run the circuits
sampler = Sampler(backend=backend)
job = sampler.run([circ_org, circ_new], shots=10000)
print(f"Job ID: {job.job_id()}")
Output:
Job ID: 19d635b0-4d8b-44c2-a76e-49e4b9078b1b
# get results
result = job.result()
synthesis_org_result = result[0].data.meas.get_counts()
synthesis_new_result = result[1].data.meas.get_counts()
# plot
sim_result = {"0" * 5: 0.5, "1" * 5: 0.5}
plot_histogram(
[
result
for result in [
sim_result,
unoptimized_result,
synthesis_org_result,
synthesis_new_result,
]
],
bar_labels=False,
legend=[
"ideal",
"no optimization",
"synthesis_org",
"synthesis_new",
],
)
Output:

In general, circuit synthesis depends on application and it's too difficult for a software to cover all possible applications. Qiskit transpiler happens to have no functions of synthesizing GHZ state preparation circuit. In such a case, manual circuit synthesis as shown above is worth considering.
In this section, we look into the details of how Qiskit transpiler works using the following toy example circuit.
# Build a toy example circuit
from math import pi
import itertools
from qiskit.circuit import QuantumCircuit
from qiskit.circuit.library import excitation_preserving
circuit = QuantumCircuit(4, name="Example circuit")
circuit.append(excitation_preserving(4, reps=1, flatten=True), range(4))
circuit.measure_all()
value_cycle = itertools.cycle([0, pi / 4, pi / 2, 3 * pi / 4, pi, 2 * pi])
circuit.assign_parameters(
[x[1] for x in zip(range(len(circuit.parameters)), value_cycle)], inplace=True
)
circuit.draw("mpl")
Output:

3.1 Draw the entire Qiskit transpilation flow
We look into the transpiler passes (tasks) for optimization_level=1
.
from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager
# There is no need to read this entire image, but this outputs all the steps in the transpile() call
# for optimization level 1
pm = generate_preset_pass_manager(1, backend, seed_transpiler=42)
pm.draw()
Output:

The flow consists of six stages:
print(pm.stages)
Output:
('init', 'layout', 'routing', 'translation', 'optimization', 'scheduling')
3.2 Draw an individual stage
First, let's draw all the tasks (transpiler passes) done in the init
stage.
pm.init.draw()
Output:

We can run each individual stage. Let's run init
stage for our circuit. By enabling logger, we can see the details of the run.
import logging
logger = logging.getLogger()
logger.setLevel("INFO")
init_out = pm.init.run(circuit)
init_out.draw("mpl", fold=-1)
Output:
INFO:qiskit.passmanager.base_tasks:Pass: UnitarySynthesis - 0.03576 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: HighLevelSynthesis - 0.16618 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: BasisTranslator - 0.07176 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: InverseCancellation - 0.27299 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: ContractIdleWiresInControlFlow - 0.00811 (ms)

3.3 Exercise
Draw layout
stage passes and run the stage for the output circuit of the init
stage (init_out
), by modifying cells used above.
Solution:
display(pm.layout.draw())
layout_out = pm.layout.run(init_out)
layout_out.draw("mpl", idle_wires=False, fold=-1)
Output:

INFO:qiskit.passmanager.base_tasks:Pass: SetLayout - 0.01001 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: TrivialLayout - 0.07129 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: CheckMap - 0.08917 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: VF2Layout - 1.24431 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: BarrierBeforeFinalMeasurements - 0.02599 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: SabreLayout - 5.11169 (ms)

Do the same thing for translation
stage.
Solution:
display(pm.translation.draw())
basis_out = pm.translation.run(layout_out)
basis_out.draw("mpl", idle_wires=False, fold=-1)
Output:

INFO:qiskit.passmanager.base_tasks:Pass: UnitarySynthesis - 0.03386 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: HighLevelSynthesis - 0.02718 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: BasisTranslator - 2.64192 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: CheckGateDirection - 0.02217 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: GateDirection - 0.36502 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: BasisTranslator - 0.64778 (ms)

Note: Any individual stage cannot always be run independently (as some of them need to carry over information from one previous stage).
3.4 Optimization Stage
The last default stage in the pipeline is optimization. After we've embedded the circuit for the target the circuit has expanded quite a bit. Most of this is due to inefficiencies in the equivalence relationships from basis translation and swap insertion. The optimization stage is used to try and minimize the size and depth of the circuit. It runs a series of passes in a do while
loop until it reaches a steady output.
# pm.pre_optimization.draw()
pm.optimization.draw()
Output:

logger = logging.getLogger()
logger.setLevel("INFO")
opt_out = pm.optimization.run(basis_out)
Output:
INFO:qiskit.passmanager.base_tasks:Pass: Depth - 0.30112 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: FixedPoint - 0.03195 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: Size - 0.01216 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: FixedPoint - 0.01001 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: Optimize1qGatesDecomposition - 0.63729 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: InverseCancellation - 0.41723 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: ContractIdleWiresInControlFlow - 0.01192 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: GatesInBasis - 0.05484 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: Depth - 0.08583 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: FixedPoint - 0.20599 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: Size - 0.00787 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: FixedPoint - 0.00715 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: Optimize1qGatesDecomposition - 0.16809 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: InverseCancellation - 0.17190 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: ContractIdleWiresInControlFlow - 0.00691 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: GatesInBasis - 0.02408 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: Depth - 0.04935 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: FixedPoint - 0.00525 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: Size - 0.00620 (ms)
INFO:qiskit.passmanager.base_tasks:Pass: FixedPoint - 0.00286 (ms)
opt_out.draw("mpl", idle_wires=False, fold=-1)
Output:

4. In-depth examples
4.1 Two-qubit block optimization using two-qubit unitary synthesis
For level 2 and 3, we have more passes (Collect2qBlocks
, ConsolidateBlocks
, UnitarySynthesis
) for more optimization, namely two-qubit block optimization. (Compare the optimization stage flow for level 2 with that above for level 1)
The two-qubit block optimization is composed of two steps: Collecting and consolidating 2-qubit blocks and synthesizing the 2-qubit unitary matrices.
pm2 = generate_preset_pass_manager(2, backend, seed_transpiler=42)
pm2.optimization.draw()
Output:

from qiskit.transpiler import PassManager
from qiskit.transpiler.passes import (
Collect2qBlocks,
ConsolidateBlocks,
UnitarySynthesis,
)
# Collect 2q blocks and consolidate to unitary when we expect that we can reduce the 2q gate count for that unitary
consolidate_pm = PassManager(
[
Collect2qBlocks(),
ConsolidateBlocks(target=backend.target),
]
)
display(basis_out.draw("mpl", idle_wires=False, fold=-1))
consolidated = consolidate_pm.run(basis_out)
consolidated.draw("mpl", idle_wires=False, fold=-1)
Output:


# Synthesize unitaries
UnitarySynthesis(target=backend.target)(consolidated).draw(
"mpl", idle_wires=False, fold=-1
)
Output:

logger.setLevel("WARNING")
We saw in Part 2 that the real quantum compiler flow is not that simple and is composed of many passes (tasks). This is mainly due to the software engineering required to ensure performance for a wide range of application circuits and maintainability of the software. Qiskit transpiler would work well in most cases but if you happen to see your circuit is not well optimized by Qiskit transpiler, it would be a good opportunity to research your own application-specific circuit optimization as shown in Part 1. Transpiler technology is evolving, your R&D contribution is welcome.
from qiskit.circuit import QuantumCircuit
from qiskit_ibm_runtime import QiskitRuntimeService, Sampler
service = QiskitRuntimeService()
backend = service.backend("ibm_sherbrooke")
sampler = Sampler(backend)
circ = QuantumCircuit(3)
circ.ccx(0, 1, 2)
circ.measure_all()
circ.draw("mpl")
Output:

sampler.run([circ]) # IBMInputValueError will be raised
Output:
---------------------------------------------------------------------------
IBMInputValueError Traceback (most recent call last)
Cell In[44], line 1
----> 1 sampler.run([circ]) # IBMInputValueError will be raised
File /opt/homebrew/Caskroom/miniforge/base/envs/doc/lib/python3.11/site-packages/qiskit_ibm_runtime/sampler.py:111, in SamplerV2.run(self, pubs, shots)
107 coerced_pubs = [SamplerPub.coerce(pub, shots) for pub in pubs]
109 validate_classical_registers(coerced_pubs)
--> 111 return self._run(coerced_pubs)
File /opt/homebrew/Caskroom/miniforge/base/envs/doc/lib/python3.11/site-packages/qiskit_ibm_runtime/base_primitive.py:158, in BasePrimitiveV2._run(self, pubs)
156 for pub in pubs:
157 if getattr(self._backend, "target", None) and not is_simulator(self._backend):
--> 158 validate_isa_circuits([pub.circuit], self._backend.target)
160 if isinstance(self._backend, IBMBackend):
161 self._backend.check_faulty(pub.circuit)
File /opt/homebrew/Caskroom/miniforge/base/envs/doc/lib/python3.11/site-packages/qiskit_ibm_runtime/utils/validations.py:96, in validate_isa_circuits(circuits, target)
94 message = is_isa_circuit(circuit, target)
95 if message:
---> 96 raise IBMInputValueError(
97 message
98 + " Circuits that do not match the target hardware definition are no longer "
99 "supported after March 4, 2024. See the transpilation documentation "
100 "(https://quantum.cloud.ibm.com/docs/guides/transpile) for instructions "
101 "to transform circuits and the primitive examples "
102 "(https://quantum.cloud.ibm.com/docs/guides/primitives-examples) to see "
103 "this coupled with operator transformations."
104 )
IBMInputValueError: 'The instruction ccx on qubits (0, 1, 2) is not supported by the target system. Circuits that do not match the target hardware definition are no longer supported after March 4, 2024. See the transpilation documentation (https://quantum.cloud.ibm.com/docs/guides/transpile) for instructions to transform circuits and the primitive examples (https://quantum.cloud.ibm.com/docs/guides/primitives-examples) to see this coupled with operator transformations.'
4.2 Circuit optimization matters
We first compare the results of running 5-qubit GHZ state () preparation circuits with and without optimization.
from qiskit.circuit import QuantumCircuit
from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager
from qiskit_ibm_runtime import QiskitRuntimeService, Sampler
service = QiskitRuntimeService()
# backend = service.backend('ibm_sherbrooke')
backend = service.least_busy(
operational=True, simulator=False, min_num_qubits=127
) # Eagle
backend
We first use a GHZ circuit naively synthesized as follows.
num_qubits = 5
ghz_circ = QuantumCircuit(num_qubits)
ghz_circ.h(0)
[ghz_circ.cx(0, i) for i in range(1, num_qubits)]
ghz_circ.measure_all()
ghz_circ.draw("mpl")
Output:

We transpile the circuit without optimization (optimization_level=0
) and with optimization (optimization_level=2
).
As you can see, there is a big difference in the circuit length of transpiled circuits.
pm0 = generate_preset_pass_manager(
optimization_level=0, backend=backend, seed_transpiler=777
)
pm2 = generate_preset_pass_manager(
optimization_level=2, backend=backend, seed_transpiler=777
)
circ0 = pm0.run(ghz_circ)
circ2 = pm2.run(ghz_circ)
print("optimization_level=0:")
display(circ0.draw("mpl", idle_wires=False, fold=-1))
print("optimization_level=2:")
display(circ2.draw("mpl", idle_wires=False, fold=-1))
Output:
optimization_level=0:

optimization_level=2:

# run the circuits
sampler = Sampler(backend)
job = sampler.run([circ0, circ2], shots=10000)
job_id = job.job_id()
print(f"Job ID: {job_id}")
Output:
Job ID: d13rnnemya70008ek1zg
# REPLACE WITH YOUR OWN JOB IDS
job = service.job(job_id)
# get results
result = job.result()
unoptimized_result = result[0].data.meas.get_counts()
optimized_result = result[1].data.meas.get_counts()
from qiskit.visualization import plot_histogram
# plot
sim_result = {"0" * 5: 0.5, "1" * 5: 0.5}
plot_histogram(
[result for result in [sim_result, unoptimized_result, optimized_result]],
bar_labels=False,
legend=[
"ideal",
"no optimization",
"with optimization",
],
)
Output:

4.3 Circuit synthesis matters
We next compare the results of running two differently synthesized 5-qubit GHZ state () preparation circuits.
# Original GHZ circuit (naive synthesis)
ghz_circ.draw("mpl")
Output:

# A better GHZ circuit (smarter synthesis), you learned in a previous lecture
ghz_circ2 = QuantumCircuit(5)
ghz_circ2.h(2)
ghz_circ2.cx(2, 1)
ghz_circ2.cx(2, 3)
ghz_circ2.cx(1, 0)
ghz_circ2.cx(3, 4)
ghz_circ2.measure_all()
ghz_circ2.draw("mpl")
Output:

circ_org = pm2.run(ghz_circ)
circ_new = pm2.run(ghz_circ2)
print("original synthesis:")
display(circ_org.draw("mpl", idle_wires=False, fold=-1))
print("new synthesis:")
display(circ_new.draw("mpl", idle_wires=False, fold=-1))
Output:
original synthesis:

new synthesis:

# run the circuits
sampler = Sampler(backend)
job = sampler.run([circ_org, circ_new], shots=10000)
job_id = job.job_id()
print(f"Job ID: {job_id}")
Output:
Job ID: d13rp283grvg008j12fg
# REPLACE WITH YOUR OWN JOB IDS
job = service.job(job_id)
# get results
result = job.result()
synthesis_org_result = result[0].data.meas.get_counts()
synthesis_new_result = result[1].data.meas.get_counts()
# plot
sim_result = {"0" * 5: 0.5, "1" * 5: 0.5}
plot_histogram(
[result for result in [sim_result, synthesis_org_result, synthesis_new_result]],
bar_labels=False,
legend=[
"ideal",
"synthesis_org",
"synthesis_new",
],
)
Output:

4.4 General 1-qubit gate decomposition
from qiskit import QuantumCircuit, transpile
from qiskit.circuit import Parameter
from qiskit.circuit.library.standard_gates import UGate
phi, theta, lam = Parameter("φ"), Parameter("θ"), Parameter("λ")
qc = QuantumCircuit(1)
qc.append(UGate(theta, phi, lam), [0])
qc.draw(output="mpl")
Output:

transpile(qc, basis_gates=["rz", "sx"]).draw(output="mpl")
Output:

4.5 One-qubit block optimization
from qiskit import QuantumCircuit
qc = QuantumCircuit(1)
qc.x(0)
qc.y(0)
qc.z(0)
qc.rx(1.23, 0)
qc.ry(1.23, 0)
qc.rz(1.23, 0)
qc.h(0)
qc.s(0)
qc.t(0)
qc.sx(0)
qc.sdg(0)
qc.tdg(0)
qc.draw(output="mpl")
Output:

from qiskit.quantum_info import Operator
Operator(qc)
Output:
Operator([[ 0.45292511-0.57266982j, -0.66852684-0.14135058j],
[ 0.14135058+0.66852684j, -0.57266982+0.45292511j]],
input_dims=(2,), output_dims=(2,))
from qiskit import transpile
qc_opt = transpile(qc, basis_gates=["rz", "sx"])
qc_opt.draw(output="mpl")
Output:

Operator(qc_opt)
Output:
Operator([[ 0.45292511-0.57266982j, -0.66852684-0.14135058j],
[ 0.14135058+0.66852684j, -0.57266982+0.45292511j]],
input_dims=(2,), output_dims=(2,))
Operator(qc).equiv(Operator(qc_opt))
Output:
True
4.6 Toffoli decomposition
qc = QuantumCircuit(3)
qc.ccx(0, 1, 2)
qc.draw(output="mpl")
Output:

from qiskit import QuantumCircuit, transpile
qc = QuantumCircuit(3)
qc.ccx(0, 1, 2)
qc = transpile(qc, basis_gates=["rz", "sx", "cx"])
qc.draw(output="mpl")
Output:

4.7 CU gate decomposition
from qiskit.circuit.library.standard_gates import CUGate
phi, theta, lam, gamma = Parameter("φ"), Parameter("θ"), Parameter("λ"), Parameter("γ")
qc = QuantumCircuit(2)
# qc.cu(theta, phi, lam, gamma, 0, 1)
qc.append(CUGate(theta, phi, lam, gamma), [0, 1])
qc.draw(output="mpl")
Output:

from qiskit.circuit.library.standard_gates import CUGate
phi, theta, lam, gamma = Parameter("φ"), Parameter("θ"), Parameter("λ"), Parameter("γ")
qc = QuantumCircuit(2)
qc.append(CUGate(theta, phi, lam, gamma), [0, 1])
qc = transpile(qc, basis_gates=["rz", "sx", "cx"])
qc.draw(output="mpl")
Output:

4.8 CX, ECR, CZ equal up to local Cliffords
Note that (Hadamard), ( Z-rotation), ( Z-rotation), (Pauli X) are all Clifford gates.
qc = QuantumCircuit(2)
qc.cx(0, 1)
qc.draw(output="mpl", style="bw")
Output:

qc = QuantumCircuit(2)
qc.cx(0, 1)
transpile(qc, basis_gates=["x", "s", "h", "sdg", "ecr"]).draw(output="mpl", style="bw")
Output:

qc = QuantumCircuit(2)
qc.cx(0, 1)
transpile(qc, basis_gates=["h", "cz"]).draw(output="mpl", style="bw")
Output:

Using IBM backend 1q basis gates "rz", "sx" and "x".
qc = QuantumCircuit(2)
qc.cx(0, 1)
transpile(qc, basis_gates=["rz", "sx", "x", "ecr"]).draw(output="mpl", style="bw")
Output:

qc = QuantumCircuit(2)
qc.cx(0, 1)
transpile(qc, basis_gates=["rz", "sx", "x", "cz"]).draw(output="mpl", style="bw")
Output:

# Check Qiskit version
import qiskit
qiskit.__version__
Output:
'2.0.2'