SQD for energy estimation of a chemistry Hamiltonian

In this lesson, we will apply SQD to estimate the ground state energy of a molecule.

4

Step 1: Map problem to quantum circuits and operators
- Setup the molecular Hamiltonian for $N_2$ .
- Explain the chemistry-inspired and hardware-friendly local unitary cluster Jastrow (LUCJ) [1]
Step 2: Optimize for target hardware
- Optimize gate counts and layout of the ansatz for hardware execution
Step 3: Execute on target hardware
- Run the optimized circuit on a real QPU to generate samples of the subspace.
Step 4: Post-process results
- Introduce the self-consistent configuration recovery loop [2]
  - Post-process the full set of bitstring samples, using prior knowledge of particle number and the average orbital occupancy calculated on the most recent iteration.
  - Probabilistically create batches of subsamples from recovered bitstrings.
  - Project and diagonalize the molecular Hamiltonian over each sampled subspace.
  - Save the minimum ground state energy found across all batches and update the avg orbital occupancy.

We will use several software packages throughout the lesson.

PySCF to define the molecule and setup the Hamiltonian.
ffsim package to construct the LUCJ ansatz.
Qiskit for transpiling the ansatz for hardware execution.
Qiskit IBM Runtime to execute the circuit on a QPU and collect samples.
Qiskit addon SQD configuration recovery and ground state energy estimation using subspace projection and matrix diagonalization.

1. Map problem to quantum circuits and operators

Molecular Hamiltonian

A molecular Hamiltonian takes the generic form:

\hat{H} = \sum_{ \substack{pr\\\sigma} } h_{pr} \, \hat{a}^\dagger_{p\sigma} \hat{a}_{r\sigma} + \sum_{ \substack{prqs\\\sigma\tau} } \frac{(pr|qs)}{2} \, \hat{a}^\dagger_{p\sigma} \hat{a}^\dagger_{q\tau} \hat{a}_{s\tau} \hat{a}_{r\sigma}

\hat{a}^\dagger_{p\sigma}

import warnings
import pyscf
import pyscf.cc
import pyscf.mcscf
 
warnings.filterwarnings("ignore")
 
# Specify molecule properties
open_shell = False
spin_sq = 0
 
# Build N2 molecule
mol = pyscf.gto.Mole()
mol.build(
    atom=[["N", (0, 0, 0)], ["N", (1.0, 0, 0)]],  # Two N atoms 1 angstrom apart
    basis="6-31g",
    symmetry="Dooh",
)
 
# Define active space
n_frozen = 2
active_space = range(n_frozen, mol.nao_nr())
 
# Get molecular integrals
scf = pyscf.scf.RHF(mol).run()
num_orbitals = len(active_space)
n_electrons = int(sum(scf.mo_occ[active_space]))
num_elec_a = (n_electrons + mol.spin) // 2
num_elec_b = (n_electrons - mol.spin) // 2
cas = pyscf.mcscf.CASCI(scf, num_orbitals, (num_elec_a, num_elec_b))
mo = cas.sort_mo(active_space, base=0)
hcore, nuclear_repulsion_energy = cas.get_h1cas(mo)  # hcore: one-body integrals
eri = pyscf.ao2mo.restore(1, cas.get_h2cas(mo), num_orbitals)  # eri: two-body integrals
 
# Compute exact energy for comparison
exact_energy = cas.run().e_tot

Output:

converged SCF energy = -108.835236570774
CASCI E = -109.046671778080  E(CI) = -32.8155692383188  S^2 = 0.0000000

\alpha

1.1 Quantum circuit for sample generation: The LUCJ ansatz

In this lesson, we will use the local unitary coupled cluster Jastrow (LUCJ) \[1\] ansatz for quantum state preparation and subsequent sampling. First, we will explain different building blocks of the full UCJ ansatz and the approximations made in the local version of it. Next, by using ffsim package, we will construct the LUCJ ansatz and optimize it using Qiskit transpiler for hardware execution.

L

|\psi\rangle = \prod_{\mu=1}^{L}{(e^{K^{\mu}} \times {e^{iJ^{\mu}}} \times {e^{-K^{\mu}}})} |\Phi_{0}\rangle

\vert \Phi_{0} \rangle

A circuit diagram showing 8 qubits, 4 called alpha orbitals and 4 called beta orbitals. The top two alpha and the top two beta have a "not" gate.

{(e^{K^{(\mu)}} \times {e^{iJ^{(\mu)}}} \times {e^{-K^{(\mu)}}})}

A circuit diagram showing that the UCJ circuit can be broken down into rotation layers and a diagonal Coulomb evolution layer.

\alpha

The 2-qubit gates act on adjacent spin-orbitals (nearest neighbor qubits), and therefore, are implementable on IBM QPUs without the need for SWAP gates.

A circuit diagram showing 4 alpha orbital qubits and 4 beta orbital qubits. The circuits start with R-Z gates, and then have a series of Given's rotation gates.

e^{iJ^{(\mu)}}

e^{iJ^{(\mu)}}

J_{\alpha \alpha}

e^{iJ_{\alpha \alpha}^{(\mu)}}

A circuit diagram showing linearly-coupled qubits and corresponding alpha/beta circuits.

J_{\alpha \beta}

A circuit diagram showing 4 alpha qubits connected to the 4 beta qubits.

U_{nn}

J_{\alpha \alpha}

A circuit diagram showing 4 alpha qubits and 4 beta qubits each with R-Z gates, followed by two-qubit gates.

J_{\alpha \beta}

J_{\alpha \beta}

Grid: we can have $U_{nn}$ gates between all $\alpha$ and $\beta$ orbitals without any SWAPs, and therefore, do not need to remove any $U_{nn}$ gates.
Hexagonal: Every other orbital (0th, 2nd, 4th, etc. indexed orbitals) becomes nearest neighbors when $\alpha$ and $\beta$ are laid out in two adjacent linear chains.
Linear: Only one $\alpha$ and one $\beta$ orbital are connected, which means the $J_{\alpha \beta}$ block will have only one gate.
Heavy-hex: The $\alpha$ - $\beta$ interactions are kept between every $4$ -th indexed (0th, 4th, 8th, etc.) spin orbitals and are need ancilla mediated, i.e., we need ancilla qubits between the linear chains representing $\alpha$ and $\beta$ orbitals. This arrangement needs a limited number of SWAPs.

Connectivity diagrams for different qubit layouts. They show qubits arranged on a square grid, a hexagonal lattice, a heavy-hex lattice (hexagonal lattice with one extra qubit along each side of the hexagon), and a linear chain.

L

1.2 LUCJ ansatz initialization

The LUCJ is a parameterized ansatz, and we need to initialize the parameters before hardware execution. One way to initialize ansatz is by using t1 and t2 amplitudes from classical coupled cluster singles and doubles (CCSD) method, where t1 amplitudes are the coefficient of single excitation operators and t2 amplitudes are for double excitation operators.

Note that while initializing the LUCJ ansatz with t1 and t2 amplitudes generate decent results, the ansatz parameters may need further optimization.

# Get CCSD t2 amplitudes for initializing the ansatz
ccsd = pyscf.cc.CCSD(
    scf, frozen=[i for i in range(mol.nao_nr()) if i not in active_space]
)
ccsd.run()
 
t1 = ccsd.t1
t2 = ccsd.t2

Output:

E(CCSD) = -109.0398256929733  E_corr = -0.20458912219883

1.3 Constructing the LUCJ ansatz using `ffsim`

We will use the ffsim package to create and initialize the ansatz with t1 and t2 amplitudes computed above. Since our molecule has a closed-shell Hartree-Fock state, we will use the spin-balanced variant of the UCJ ansatz, UCJOpSpinBalanced .

As IBM hardware has a heavy-hex topology, we will adopt the zig-zag pattern used in [1] and explained above for qubit interactions. In this pattern, orbitals (qubits) with the same spin are connected with a line topology (red and blue circles). Due to the heavy-hex topology, orbitals for different spins have connections between every 4th orbital (0th, 4th, 8th, etc.) (purple circles).

A zig-zag pattern traced out along a heavy-hex lattice.

import ffsim
from qiskit import QuantumCircuit, QuantumRegister
 
n_reps = 2
alpha_alpha_indices = [(p, p + 1) for p in range(num_orbitals - 1)]
alpha_beta_indices = [(p, p) for p in range(0, num_orbitals, 4)]
 
ucj_op = ffsim.UCJOpSpinBalanced.from_t_amplitudes(
    t2=t2,
    t1=t1,
    n_reps=n_reps,
    interaction_pairs=(alpha_alpha_indices, alpha_beta_indices),
)
 
nelec = (num_elec_a, num_elec_b)
 
# create an empty quantum circuit
qubits = QuantumRegister(2 * num_orbitals, name="q")
circuit = QuantumCircuit(qubits)
 
# prepare Hartree-Fock state as the reference state and append it to the quantum circuit
circuit.append(ffsim.qiskit.PrepareHartreeFockJW(num_orbitals, nelec), qubits)
 
# apply the UCJ operator to the reference state
circuit.append(ffsim.qiskit.UCJOpSpinBalancedJW(ucj_op), qubits)
circuit.measure_all()
# circuit.decompose().draw("mpl", scale=0.5, fold=-1)

The LUCJ ansatz with repeated layers can be optimized by merging some adjacent blocks. Consider a case for n_reps=2 . The two orbital rotation blocks in the middle can be merged into a single orbital rotation block. The ffsim package has a pass manager named ffsim.qiskit.PRE_INIT to optimize the circuit by merging such adjacent blocks.

A diagram showing layers of the LUCJ ansatz.

2. Optimize for target hardware

First, we fetch a backend of our choice. We will optimize our circuit for the backend, and then execute the optimized circuit on the same backend to generate samples for the subspace.

from qiskit_ibm_runtime import QiskitRuntimeService
 
service = QiskitRuntimeService()
backend = service.backend("ibm_kyiv")

Next, we recommend the following steps to optimize the ansatz and make it hardware-compatible.

Select physical qubits (initial_layout) from the target hardware that adheres to the zig-zag pattern (two linear chains with ancilla qubit in-between them) described above. Laying out qubits in this pattern leads to an efficient hardware-compatible circuit with less gates.
Generate a staged pass manager using the generate_preset_pass_manager function from Qiskit with your choice of backend and initial_layout.
Set the pre_init stage of your staged pass manager to ffsim.qiskit.PRE_INIT. ffsim.qiskit.PRE_INIT includes Qiskit transpiler passes that decompose gates into orbital rotations and then merges the orbital rotations, resulting in fewer gates in the final circuit.
Run the pass manager on your circuit.

from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager
 
spin_a_layout = [0, 14, 18, 19, 20, 33, 39, 40, 41, 53, 60, 61, 62, 72, 81, 82]
spin_b_layout = [2, 3, 4, 15, 22, 23, 24, 34, 43, 44, 45, 54, 64, 65, 66, 73]
 
initial_layout = spin_a_layout + spin_b_layout
 
pass_manager = generate_preset_pass_manager(
    optimization_level=3, backend=backend, initial_layout=initial_layout
)
 
# without PRE_INIT passes
isa_circuit = pass_manager.run(circuit)
print(f"Gate counts (w/o pre-init passes): {isa_circuit.count_ops()}")
 
# with PRE_INIT passes
# We will use the circuit generated by this pass manager for hardware execution
pass_manager.pre_init = ffsim.qiskit.PRE_INIT
isa_circuit = pass_manager.run(circuit)
print(f"Gate counts (w/ pre-init passes): {isa_circuit.count_ops()}")

Output:

Gate counts (w/o pre-init passes): OrderedDict({'rz': 7579, 'sx': 6106, 'ecr': 2316, 'x': 336, 'measure': 32, 'barrier': 1})
Gate counts (w/ pre-init passes): OrderedDict({'rz': 4088, 'sx': 3125, 'ecr': 1262, 'x': 201, 'measure': 32, 'barrier': 1})

3. Execute on target hardware

After optimizing the circuit for hardware execution, we are ready to run it on the target hardware and collect samples for ground state energy estimation. As we only have one circuit, we will use Qiskit Runtime's Job execution mode and execute our circuit.

from qiskit_ibm_runtime import SamplerV2 as Sampler
 
sampler = Sampler(mode=backend)
sampler.options.dynamical_decoupling.enable = True
 
job = sampler.run([isa_circuit], shots=10_000)  # Takes approximately 5sec of QPU time

# Run cell after IQX job completion
primitive_result = job.result()
pub_result = primitive_result[0]
counts = pub_result.data.meas.get_counts()

4. Post-process results

The post-processing part of the SQD workflow can be summarized using the following diagram.

A flow chart showing how sampled states are used to determine ground state eigenvalues and eigenvectors.

\tilde{\mathcal{\chi}}

H_{\mathcal{S}^{(k)}} = P_{\mathcal{S}^{(k)}} H _{\mathcal{S}^{(k)}} \text{ with } P_{\mathcal{S}^{(k)}} = \sum_{x \in \mathcal{S}^{(k)}} \vert x \rangle \langle x \vert

H_{\mathcal{S}^{(k)}}

H_{\mathcal{S}^{(k)}} \vert \psi^{(k)} \rangle = E^{(k)} \vert \psi^{(k)} \rangle

\text{n}

N_2

4.1 Configuration recovery: overview

x

|x[i] - avg\_occupancy[i]|

\begin{align} w(y) = \begin{cases} \delta \frac{y}{h} & \text{if } y \leq h\\ \nonumber \delta + (1 - \delta) \frac{y - h}{1 - h} & \text{if } y > h \end{cases} \end{align}

h

n

N = 2

Suppose in the first iteration we run two batches, and the estimated ground states from them are:

\begin{align}\nonumber \text{Batch0: } \vert \psi \rangle &= 0.8 \times \vert 1001 \rangle + 0.6 \times \vert 0101 \rangle \\ \nonumber \text{Batch1: } \vert \psi \rangle &= \frac{1}{\sqrt{3}} \left( \vert 1001 \rangle + \vert 0101 \rangle + \vert 0110 \rangle \right) \nonumber \end{align}

^2

Occupancy (Batch0):

	Q3	Q2	Q1	Q0
1001	0.64	0.0	0.0	0.64
0110	0.0	0.36	0.36	0.0
n (Batch0)	0.64	0.36	0.36	0.64

Occupancy (Batch1)

	Q3	Q2	Q1	Q0
1001	0.33	0.00	0.00	0.33
0101	0.0	0.33	0.00	0.33
0110	0.0	0.33	0.33	0.00
n (Batch1)	0.33	0.66	0.33	0.66

Occupancy (average across batches)

	Q3	Q2	Q1	Q0
n (Batch0)	0.64	0.36	0.36	0.64
n (Batch1)	0.33	0.66	0.33	0.66
n (average)	0.49	0.51	0.35	0.65

x = \vert 1000 \rangle

x = \vert 1000 \rangle

	Q3	Q2	Q1	Q0
p(flip) ( $\vert x[i] - \text{n}[i] \vert$ )	0	0.51	0.35	0.65
w(p(flip))	0	0.03	0.007	0.31

\vert \text{1001} \rangle

The complete self-consistent configuration recovery process can be summarized as follows:

\widetilde{\chi}

Configurations from ( $\widetilde{\chi}_{correct}$ ) are randomly sampled to create batches $(\mathcal{S}^{(1)}, \cdots, \mathcal{S}^{(K)})$ of vectors for subspace projection. The number of batches and samples in each batch are user defined parameters. The larger the number of samples in each batch, the larger the subspace dimension and more computationally demanding the diagonalization becomes. On the other hand, too small number of samples may miss the ground state support vectors and lead to incorrect estimation.
Run the eigenstate solver (i.e. projection onto subspace and diagonalization) on the batches and obtain approximate eigenstates. $|\psi^{(1)}\rangle, \cdots, |\psi^{(K)}\rangle$ .
From the approximate eigenstates construct the first guess for $n$ .

Subsequent iterations:

Using $n$ correct the configurations with wrong particle number in $\widetilde{\chi}_{incorrect}$ . Suppose we name them $\widetilde{\chi}_{correct\_new}$ . Then, $\widetilde{\chi}_{recovered} (\widetilde{\chi}_{R}) = \widetilde{\chi}_{correct} \cup \widetilde{\chi}_{correct\_new}$ forms the new set of configurations with correct particle numbers.
$\widetilde{\chi}_{R}$ is sampled to create batches $\mathcal{S}^{(1)}, \cdots, \mathcal{S}^{(K)}$ .
Eigenstate solver runs with new batches and generates new estimates of ground states $|\psi^{(1)}\rangle, \cdots, |\psi^{(K)}\rangle$ .
From the approximate eigenstates construct refined guess for $n$ .
If the stopping criterion is not met, go back to step 2.1.

4.2 Ground state estimation

First, we will transform the counts into a bitstring matrix and probability array for post-processing.

Each row in the matrix represents one unique bitstring. Since qubits are indexed from the right of a bitstring in Qiskit, column 0 represents qubit N-1, and column N-1 represents qubit 0, where N is the number of qubits.

The alpha orbitals are represented in the column index range (N, N/2] (right half), and the beta orbitals are represented in the column range (N/2, 0] (left half).

from qiskit_addon_sqd.counts import counts_to_arrays
 
# Convert counts into bitstring and probability arrays
bitstring_matrix_full, probs_arr_full = counts_to_arrays(counts)

There are a few user-controlled options which are important for this technique:

iterations: Number of self-consistent configuration recovery iterations
n_batches: Number of batches of configurations used by the different calls to the eigenstate solver
samples_per_batch: Number of unique configurations to include in each batch
max_davidson_cycles: Maximum number of Davidson cycles run by each eigensolver

import numpy as np
from qiskit_addon_sqd.configuration_recovery import recover_configurations
from qiskit_addon_sqd.fermion import (
    bitstring_matrix_to_ci_strs,
    solve_fermion,
)
from qiskit_addon_sqd.subsampling import postselect_and_subsample
 
rng = np.random.default_rng(24)
# SQD options
iterations = 5
 
# Eigenstate solver options
n_batches = 5
samples_per_batch = 500
max_davidson_cycles = 300
 
# Self-consistent configuration recovery loop
e_hist = np.zeros((iterations, n_batches))  # energy history
s_hist = np.zeros((iterations, n_batches))  # spin history
occupancy_hist = []
avg_occupancy = None
for i in range(iterations):
    print(f"Starting configuration recovery iteration {i}")
    # On the first iteration, we have no orbital occupancy information from the
    # solver, so we begin with the full set of noisy configurations.
    if avg_occupancy is None:
        bs_mat_tmp = bitstring_matrix_full
        probs_arr_tmp = probs_arr_full
 
    # If we have average orbital occupancy information, we use it to refine the full set of noisy configurations
    else:
        bs_mat_tmp, probs_arr_tmp = recover_configurations(
            bitstring_matrix_full,
            probs_arr_full,
            avg_occupancy,
            num_elec_a,
            num_elec_b,
            rand_seed=rng,
        )
 
    # Create batches of subsamples. We post-select here to remove configurations
    # with incorrect hamming weight during iteration 0, since no config recovery was performed.
    batches = postselect_and_subsample(
        bs_mat_tmp,
        probs_arr_tmp,
        hamming_right=num_elec_a,
        hamming_left=num_elec_b,
        samples_per_batch=samples_per_batch,
        num_batches=n_batches,
        rand_seed=rng,
    )
 
    # Run eigenstate solvers in a loop. This loop should be parallelized for larger problems.
    e_tmp = np.zeros(n_batches)
    s_tmp = np.zeros(n_batches)
    occs_tmp = []
    coeffs = []
    for j in range(n_batches):
        strs_a, strs_b = bitstring_matrix_to_ci_strs(batches[j])
        print(f"  Batch {j} subspace dimension: {len(strs_a) * len(strs_b)}")
        energy_sci, coeffs_sci, avg_occs, spin = solve_fermion(
            batches[j],
            hcore,
            eri,
            open_shell=open_shell,
            spin_sq=spin_sq,
            max_davidson=max_davidson_cycles,
        )
        energy_sci += nuclear_repulsion_energy
        e_tmp[j] = energy_sci
        s_tmp[j] = spin
        occs_tmp.append(avg_occs)
        coeffs.append(coeffs_sci)
 
    # Combine batch results
    avg_occupancy = tuple(np.mean(occs_tmp, axis=0))
 
    # Track optimization history
    e_hist[i, :] = e_tmp
    s_hist[i, :] = s_tmp
    occupancy_hist.append(avg_occupancy)

Output:

Starting configuration recovery iteration 0
  Batch 0 subspace dimension: 21609
  Batch 1 subspace dimension: 21609
  Batch 2 subspace dimension: 21609
  Batch 3 subspace dimension: 21609
  Batch 4 subspace dimension: 21609
Starting configuration recovery iteration 1
  Batch 0 subspace dimension: 609961
  Batch 1 subspace dimension: 616225
  Batch 2 subspace dimension: 627264
  Batch 3 subspace dimension: 633616
  Batch 4 subspace dimension: 624100
Starting configuration recovery iteration 2
  Batch 0 subspace dimension: 564001
  Batch 1 subspace dimension: 605284
  Batch 2 subspace dimension: 582169
  Batch 3 subspace dimension: 559504
  Batch 4 subspace dimension: 591361
Starting configuration recovery iteration 3
  Batch 0 subspace dimension: 550564
  Batch 1 subspace dimension: 549081
  Batch 2 subspace dimension: 531441
  Batch 3 subspace dimension: 527076
  Batch 4 subspace dimension: 531441
Starting configuration recovery iteration 4
  Batch 0 subspace dimension: 544644
  Batch 1 subspace dimension: 580644
  Batch 2 subspace dimension: 527076
  Batch 3 subspace dimension: 531441
  Batch 4 subspace dimension: 537289

4.3 Discussion of results

\approx

\pm \approx 1.6

# Data for energies plot
x1 = range(iterations)
min_e = [np.min(e) for e in e_hist]
e_diff = [abs(e - exact_energy) for e in min_e]
yt1 = [1.0, 1e-1, 1e-2, 1e-3, 1e-4, 1e-5]
 
# Chemical accuracy (+/- 1 milli-Hartree)
chem_accuracy = 0.001
 
# Data for avg spatial orbital occupancy
y2 = occupancy_hist[-1][0] + occupancy_hist[-1][1]
x2 = range(len(y2))

import matplotlib.pyplot as plt
 
fig, axs = plt.subplots(1, 2, figsize=(12, 6))
 
# Plot energies
axs[0].plot(x1, e_diff, label="energy error", marker="o")
axs[0].set_xticks(x1)
axs[0].set_xticklabels(x1)
axs[0].set_yticks(yt1)
axs[0].set_yticklabels(yt1)
axs[0].set_yscale("log")
axs[0].set_ylim(1e-6)
axs[0].axhline(
    y=chem_accuracy, color="#BF5700", linestyle="--", label="chemical accuracy"
)
axs[0].set_title("Approximated Ground State Energy Error vs SQD Iterations")
axs[0].set_xlabel("Iteration Index", fontdict={"fontsize": 12})
axs[0].set_ylabel("Energy Error (Ha)", fontdict={"fontsize": 12})
axs[0].legend()
 
# Plot orbital occupancy
axs[1].bar(x2, y2, width=0.8)
axs[1].set_xticks(x2)
axs[1].set_xticklabels(x2)
axs[1].set_title("Avg Occupancy per Spatial Orbital")
axs[1].set_xlabel("Orbital Index", fontdict={"fontsize": 12})
axs[1].set_ylabel("Avg Occupancy", fontdict={"fontsize": 12})
 
print(f"Exact energy: {exact_energy:.5f} Ha")
print(f"SQD energy: {min_e[-1]:.5f} Ha")
print(f"Absolute error: {e_diff[-1]:.5f} Ha")
plt.tight_layout()
plt.show()

Output:

Exact energy: -109.04667 Ha
SQD energy: -109.02234 Ha
Absolute error: 0.02434 Ha

Exercise for the reader

1000

References

[1] M. Motta et al., “Bridging physical intuition and hardware efficiency for correlated electronic states: the local unitary cluster Jastrow ansatz for electronic structure” (2023). Chem. Sci., 2023, 14, 11213 .

[2] J. Robledo-Moreno et al., "Chemistry Beyond Exact Solutions on a Quantum-Centric Supercomputer" (2024). arXiv:quant-ph/2405.05068 .

Was this page helpful?

Report a bug or request content on GitHub.

SQD for energy estimation of a chemistry Hamiltonian

1. Map problem to quantum circuits and operators

Molecular Hamiltonian

1.1 Quantum circuit for sample generation: The LUCJ ansatz

1.2 LUCJ ansatz initialization

1.3 Constructing the LUCJ ansatz using ffsim

2. Optimize for target hardware

3. Execute on target hardware

4. Post-process results

4.1 Configuration recovery: overview

4.2 Ground state estimation

4.3 Discussion of results

Exercise for the reader

References

1.3 Constructing the LUCJ ansatz using `ffsim`