Skip to main content
IBM Quantum Platform

Classical information

Like we did in the previous lesson, we'll begin this lesson with a discussion of classical information. Once again, the probabilistic and quantum descriptions are mathematically similar, and recognizing how the mathematics works in the familiar setting of classical information is helpful in understanding why quantum information is described in the way that it is.


Classical states via the Cartesian product

We'll start at a very basic level, with classical states of multiple systems. For simplicity, we'll begin by discussing just two systems, and then generalize to more than two systems.

To be precise, let X\mathsf{X} be a system whose classical state set is Σ,\Sigma, and let Y\mathsf{Y} be a second system whose classical state set is Γ.\Gamma. Note that, because we have referred to these sets as classical state sets, our assumption is that Σ\Sigma and Γ\Gamma are both finite and nonempty. It could be that Σ=Γ,\Sigma = \Gamma, but this is not necessarily so — and regardless, it will be helpful to use different names to refer to these sets in the interest of clarity.

Now imagine that the two systems, X\mathsf{X} and Y,\mathsf{Y}, are placed side-by-side, with X\mathsf{X} on the left and Y\mathsf{Y} on the right. If we so choose, we can view these two systems as if they form a single system, which we can denote by (X,Y)(\mathsf{X},\mathsf{Y}) or XY\mathsf{XY} depending on our preference. A natural question to ask about this compound system (X,Y)(\mathsf{X},\mathsf{Y}) is, "What are its classical states?"

The answer is that the set of classical states of (X,Y)(\mathsf{X},\mathsf{Y}) is the Cartesian product of Σ\Sigma and Γ,\Gamma, which is the set defined as

Σ×Γ={(a,b):aΣ  and  bΓ}. \Sigma\times\Gamma = \bigl\{(a,b)\,:\,a\in\Sigma\;\text{and}\;b\in\Gamma\bigr\}.

In simple terms, the Cartesian product is precisely the mathematical notion that captures the idea of viewing an element of one set and an element of a second set together, as if they form a single element of a single set. In the case at hand, to say that (X,Y)(\mathsf{X},\mathsf{Y}) is in the classical state (a,b)Σ×Γ(a,b)\in\Sigma\times\Gamma means that X\mathsf{X} is in the classical state aΣa\in\Sigma and Y\mathsf{Y} is in the classical state bΓ;b\in\Gamma; and if the classical state of X\mathsf{X} is aΣa\in\Sigma and the classical state of Y\mathsf{Y} is bΓ,b\in\Gamma, then the classical state of the joint system (X,Y)(\mathsf{X},\mathsf{Y}) is (a,b).(a,b).

For more than two systems, the situation generalizes in a natural way. If we suppose that X1,,Xn\mathsf{X}_1,\ldots,\mathsf{X}_n are systems having classical state sets Σ1,,Σn,\Sigma_1,\ldots,\Sigma_n, respectively, for any positive integer n,n, the classical state set of the nn-tuple (X1,,Xn),(\mathsf{X}_1,\ldots,\mathsf{X}_n), viewed as a single joint system, is the Cartesian product

Σ1××Σn={(a1,,an):a1Σ1,,anΣn}. \Sigma_1\times\cdots\times\Sigma_n = \bigl\{(a_1,\ldots,a_n)\,:\, a_1\in\Sigma_1,\:\ldots,\:a_n\in\Sigma_n\bigr\}.

Of course, we are free to use whatever names we wish for systems, and to order them as we choose. In particular, if we have nn systems like above, we could instead choose to name them X0,,Xn1\mathsf{X}_{0},\ldots,\mathsf{X}_{n-1} and arrange them from right to left, so that the joint system becomes (Xn1,,X0).(\mathsf{X}_{n-1},\ldots,\mathsf{X}_0). Following the same pattern for naming the associated classical states and classical state sets, we might then refer to a classical state

(an1,,a0)Σn1××Σ0(a_{n-1},\ldots,a_0) \in \Sigma_{n-1}\times \cdots \times \Sigma_0

of this compound system. Indeed, this is the ordering convention used by Qiskit when naming multiple qubits. We'll come back to this convention and how it connects to quantum circuits in the next lesson, but we'll start using it now to help to get used to it.

It is often convenient to write a classical state of the form (an1,,a0)(a_{n-1},\ldots,a_0) as a A *string* is a finite, ordered sequence of symbols or characters. an1a0a_{n-1}\cdots a_0 for the sake of brevity, particularly in the very typical situation that the classical state sets Σ0,,Σn1\Sigma_0,\ldots,\Sigma_{n-1} are associated with sets of symbols or characters. In this context, the term alphabet is commonly used to refer to sets of symbols used to form strings, but the mathematical definition of an alphabet is precisely the same as the definition of a classical state set: it is a finite and nonempty set.

For example, suppose that X0,,X9\mathsf{X}_0,\ldots,\mathsf{X}_9 are bits, so that the classical state sets of these systems are all the same.

Σ0=Σ1==Σ9={0,1} \Sigma_0 = \Sigma_1 = \cdots = \Sigma_9 = \{0,1\}

There are then 210=10242^{10} = 1024 classical states of the joint system (X9,,X0),(\mathsf{X}_9,\ldots,\mathsf{X}_0), which are the elements of the set

Σ9×Σ8××Σ0={0,1}10. \Sigma_9\times\Sigma_8\times\cdots\times\Sigma_0 = \{0,1\}^{10}.

Written as strings, these classical states look like this:

000000000000000000010000000010000000001100000001001111111111 \begin{array}{c} 0000000000\\ 0000000001\\ 0000000010\\ 0000000011\\ 0000000100\\ \vdots\\[1mm] 1111111111 \end{array}

For the classical state 0000000110,0000000110, for instance, we see that X1\mathsf{X}_1 and X2\mathsf{X}_2 are in the state 1,1, while all other systems are in the state 0.0.


Probabilistic states

Recall from the previous lesson that a probabilistic state associates a probability with each classical state of a system. Thus, a probabilistic state of multiple systems — viewed collectively as a single system — associates a probability with each element of the Cartesian product of the classical state sets of the individual systems.

For example, suppose that X\mathsf{X} and Y\mathsf{Y} are both bits, so that their corresponding classical state sets are Σ={0,1}\Sigma = \{0,1\} and Γ={0,1},\Gamma = \{0,1\}, respectively. Here is a probabilistic state of the pair (X,Y):(\mathsf{X},\mathsf{Y}):

Pr((X,Y)=(0,0))=1/2Pr((X,Y)=(0,1))=0Pr((X,Y)=(1,0))=0Pr((X,Y)=(1,1))=1/2 \begin{aligned} \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (0,0)\bigr) & = 1/2 \\[2mm] \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (0,1)\bigr) & = 0\\[2mm] \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (1,0)\bigr) & = 0\\[2mm] \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (1,1)\bigr) & = 1/2 \end{aligned}

This probabilistic state is one in which both X\mathsf{X} and Y\mathsf{Y} are random bits — each is 00 with probability 1/21/2 and 11 with probability 1/21/2 — but the classical states of the two bits always agree. This is an example of a correlation between these systems.

Ordering Cartesian product state sets

Probabilistic states of systems can be represented by probability vectors, as was discussed in the previous lesson. In particular, the vector entries represent probabilities for the system to be in the possible classical states of that system, and the understanding is that a correspondence between the entries and the set of classical states has been selected.

Choosing such a correspondence effectively means deciding on an ordering of the classical states, which is often natural or determined by a standard convention. For example, the binary alphabet {0,1}\{0,1\} is naturally ordered with 00 first and 11 second, so the first entry in a probability vector representing a probabilistic state of a bit is the probability for it to be in the state 0,0, and the second entry is the probability for it to be in the state 1.1.

None of this changes in the context of multiple systems, but there is a decision to be made. The classical state set of multiple systems together, viewed collectively as a single system, is the Cartesian product of the classical state sets of the individual systems — so we must decide how the elements of Cartesian products of classical state sets are to be ordered.

There is a simple convention that we follow for doing this, which is to start with whatever orderings are already in place for the individual classical state sets, and then to order the elements of the Cartesian product alphabetically. Another way to say this is that the entries in each nn-tuple (or, equivalently, the symbols in each string) are treated as though they have significance that decreases from left to right. For example, according to this convention, the Cartesian product {1,2,3}×{0,1}\{1,2,3\}\times\{0,1\} is ordered like this:

(1,0),  (1,1),  (2,0),  (2,1),  (3,0),  (3,1). (1,0),\; (1,1),\; (2,0),\; (2,1),\; (3,0),\; (3,1).

When nn-tuples are written as strings and ordered in this way, we observe familiar patterns, such as {0,1}×{0,1}\{0,1\}\times\{0,1\} being ordered as 00,01,10,11,00, 01, 10, 11, and the set {0,1}10\{0,1\}^{10} being ordered as it was written earlier in the lesson. As another example, viewing the set {0,1,,9}×{0,1,,9}\{0, 1, \dots, 9\} \times \{0, 1, \dots, 9\} as a set of strings, we obtain the two-digit numbers 0000 through 99,99, ordered numerically. This is obviously not a coincidence; our decimal number system uses precisely this sort of alphabetical ordering, where the word alphabetical should be understood as having a broad meaning that includes numerals in addition to letters.

Returning to the example of two bits from above, the probabilistic state described previously is therefore represented by the following probability vector, where the entries are labeled explicitly for the sake of clarity.

(120012)probability of being in the state 00probability of being in the state 01probability of being in the state 10probability of being in the state 11(1) \begin{pmatrix} \frac{1}{2}\\[1mm] 0\\[1mm] 0\\[1mm] \frac{1}{2} \end{pmatrix} \begin{array}{l} \leftarrow \text{probability of being in the state 00}\\[1mm] \leftarrow \text{probability of being in the state 01}\\[1mm] \leftarrow \text{probability of being in the state 10}\\[1mm] \leftarrow \text{probability of being in the state 11} \end{array} \tag{1}

Independence of two systems

A special type of probabilistic state of two systems is one in which the systems are independent. Intuitively speaking, two systems are independent if learning the classical state of either system has no effect on the probabilities associated with the other. That is, learning what classical state one of the systems is in provides no information at all about the classical state of the other.

To define this notion precisely, let us suppose once again that X\mathsf{X} and Y\mathsf{Y} are systems having classical state sets Σ\Sigma and Γ,\Gamma, respectively. With respect to a given probabilistic state of these systems, they are said to be independent if it is the case that

Pr((X,Y)=(a,b))=Pr(X=a)Pr(Y=b)(2) \operatorname{Pr}((\mathsf{X},\mathsf{Y}) = (a,b)) = \operatorname{Pr}(\mathsf{X} = a) \operatorname{Pr}(\mathsf{Y} = b) \tag{2}

for every choice of aΣa\in\Sigma and bΓ.b\in\Gamma.

To express this condition in terms of probability vectors, assume that the given probabilistic state of (X,Y)(\mathsf{X},\mathsf{Y}) is described by a probability vector, written in the Dirac notation as

(a,b)Σ×Γpabab.\sum_{(a,b) \in \Sigma\times\Gamma} p_{ab} \vert a b\rangle.

The condition (2)(2) for independence is then equivalent to the existence of two probability vectors

ϕ=aΣqaaandψ=bΓrbb,(3)\vert \phi \rangle = \sum_{a\in\Sigma} q_a \vert a \rangle \quad\text{and}\quad \vert \psi \rangle = \sum_{b\in\Gamma} r_b \vert b \rangle, \tag{3}

representing the probabilities associated with the classical states of X\mathsf{X} and Y,\mathsf{Y}, respectively, such that

pab=qarb(4)p_{ab} = q_a r_b \tag{4}

for all aΣa\in\Sigma and bΓ.b\in\Gamma.

For example, the probabilistic state of a pair of bits (X,Y)(\mathsf{X},\mathsf{Y}) represented by the vector

1600+11201+1210+1411 \frac{1}{6} \vert 00 \rangle + \frac{1}{12} \vert 01 \rangle + \frac{1}{2} \vert 10 \rangle + \frac{1}{4} \vert 11 \rangle

is one in which X\mathsf{X} and Y\mathsf{Y} are independent. Specifically, the condition required for independence is true for the probability vectors

ϕ=140+341andψ=230+131. \vert \phi \rangle = \frac{1}{4} \vert 0 \rangle + \frac{3}{4} \vert 1 \rangle \quad\text{and}\quad \vert \psi \rangle = \frac{2}{3} \vert 0 \rangle + \frac{1}{3} \vert 1 \rangle.

For instance, to make the probabilities for the 0000 state match, we need 16=14×23,\frac{1}{6} = \frac{1}{4} \times \frac{2}{3}, and indeed this is the case. Other entries can be verified in a similar manner.

On the other hand, the probabilistic state (1),(1), which we may write as

1200+1211,(5) \frac{1}{2} \vert 00 \rangle + \frac{1}{2} \vert 11 \rangle, \tag{5}

does not represent independence between the systems X\mathsf{X} and Y.\mathsf{Y}. A simple way to argue this follows.

Suppose that there did exist probability vectors ϕ\vert \phi\rangle and ψ,\vert \psi \rangle, as in equation (3)(3) above, for which the condition (4)(4) is satisfied for every choice of aa and b.b. It would then necessarily be that

q0r1=Pr((X,Y)=(0,1))=0. q_0 r_1 = \operatorname{Pr}\bigl((\mathsf{X},\mathsf{Y}) = (0,1)\bigr) = 0.

This implies that either q0=0q_0 = 0 or r1=0,r_1 = 0, because if both were nonzero, the product q0r1q_0 r_1 would also be nonzero. This leads to the conclusion that either q0r0=0q_0 r_0 = 0 (in case q0=0q_0 = 0) or q1r1=0q_1 r_1 = 0 (in case r1=0r_1 = 0). We see, however, that neither of those equalities can be true because we must have q0r0=1/2q_0 r_0 = 1/2 and q1r1=1/2.q_1 r_1 = 1/2. Hence, there do not exist vectors ϕ\vert\phi\rangle and ψ\vert\psi\rangle satisfying the property required for independence.

Having defined independence between two systems, we can now define what is meant by correlation: it is a lack of independence. For example, because the two bits in the probabilistic state represented by the vector (5)(5) are not independent, they are, by definition, correlated.

Tensor products of vectors

The condition of independence just described can be expressed succinctly through the notion of a tensor product. Although tensor products are a very general notion, and can be defined quite abstractly and applied to a variety of mathematical structures, we can adopt a simple and concrete definition in the case at hand.

Given two vectors

ϕ=aΣαaaandψ=bΓβbb,\vert \phi \rangle = \sum_{a\in\Sigma} \alpha_a \vert a \rangle \quad\text{and}\quad \vert \psi \rangle = \sum_{b\in\Gamma} \beta_b \vert b \rangle,

the tensor product ϕψ\vert \phi \rangle \otimes \vert \psi \rangle is the vector defined as

ϕψ=(a,b)Σ×Γαaβbab. \vert \phi \rangle \otimes \vert \psi \rangle = \sum_{(a,b)\in\Sigma\times\Gamma} \alpha_a \beta_b \vert ab\rangle.

The entries of this new vector correspond to the elements of the Cartesian product Σ×Γ,\Sigma\times\Gamma, which are written as strings in the previous equation. Equivalently, the vector π=ϕψ\vert \pi \rangle = \vert \phi \rangle \otimes \vert \psi \rangle is defined by the equation

abπ=aϕbψ\langle ab \vert \pi \rangle = \langle a \vert \phi \rangle \langle b \vert \psi \rangle

being true for every aΣa\in\Sigma and bΓ.b\in\Gamma.

We can now recast the condition for independence: for a joint system (X,Y)(\mathsf{X}, \mathsf{Y}) in a probabilistic state represented by a probability vector π,\vert \pi \rangle, the systems X\mathsf{X} and Y\mathsf{Y} are independent if π\vert\pi\rangle is obtained by taking a tensor product

π=ϕψ \vert \pi \rangle = \vert \phi \rangle \otimes \vert \psi \rangle

of probability vectors ϕ\vert \phi \rangle and ψ\vert \psi \rangle on each of the subsystems X\mathsf{X} and Y.\mathsf{Y}. In this situation, π\vert \pi \rangle is said to be a product state or product vector.

We often omit the symbol \otimes when taking the tensor product of kets, such as writing ϕψ\vert \phi \rangle \vert \psi \rangle rather than ϕψ.\vert \phi \rangle \otimes \vert \psi \rangle. This convention captures the idea that the tensor product is, in this context, the most natural or default way to take the product of two vectors. Although it is less common, the notation ϕψ\vert \phi\otimes\psi\rangle is also sometimes used.

When we use the alphabetical convention for ordering elements of Cartesian products, we obtain the following specification for the tensor product of two column vectors.

(α1αm)(β1βk)=(α1β1α1βkα2β1α2βkαmβ1αmβk) \begin{pmatrix} \alpha_1\\ \vdots\\ \alpha_m \end{pmatrix} \otimes \begin{pmatrix} \beta_1\\ \vdots\\ \beta_k \end{pmatrix} = \begin{pmatrix} \alpha_1 \beta_1\\ \vdots\\ \alpha_1 \beta_k\\ \alpha_2 \beta_1\\ \vdots\\ \alpha_2 \beta_k\\ \vdots\\ \alpha_m \beta_1\\ \vdots\\ \alpha_m \beta_k \end{pmatrix}

As an important aside, notice the following expression for tensor products of standard basis vectors:

ab=ab.\vert a \rangle \otimes \vert b \rangle = \vert ab \rangle.

We could alternatively write (a,b)(a,b) as an ordered pair, rather than a string, in which case we obtain ab=(a,b).\vert a \rangle \otimes \vert b \rangle = \vert (a,b) \rangle. It is, however, more common to omit the parentheses in this situation, instead writing ab=a,b.\vert a \rangle \otimes \vert b \rangle = \vert a,b \rangle. This is typical in mathematics more generally; parentheses that don't add clarity or remove ambiguity are often simply omitted.

The tensor product of two vectors has the important property that it is bilinear, which means that it is linear in each of the two arguments separately, assuming that the other argument is fixed. This property can be expressed through these equations:

1. Linearity in the first argument:

(ϕ1+ϕ2)ψ=ϕ1ψ+ϕ2ψ(αϕ)ψ=α(ϕψ)\begin{aligned} \bigl(\vert\phi_1\rangle + \vert\phi_2\rangle\bigr)\otimes \vert\psi\rangle & = \vert\phi_1\rangle \otimes \vert\psi\rangle + \vert\phi_2\rangle \otimes \vert\psi\rangle \\[1mm] \bigl(\alpha \vert \phi \rangle\bigr) \otimes \vert \psi \rangle & = \alpha \bigl(\vert \phi \rangle \otimes \vert \psi \rangle \bigr) \end{aligned}

2. Linearity in the second argument:

ϕ(ψ1+ψ2)=ϕψ1+ϕψ2ϕ(αψ)=α(ϕψ)\begin{aligned} \vert \phi \rangle \otimes \bigl(\vert \psi_1 \rangle + \vert \psi_2 \rangle \bigr) & = \vert \phi \rangle \otimes \vert \psi_1 \rangle + \vert \phi \rangle \otimes \vert \psi_2 \rangle\\[1mm] \vert \phi \rangle \otimes \bigl(\alpha \vert \psi \rangle \bigr) & = \alpha \bigl(\vert\phi\rangle\otimes\vert\psi\rangle\bigr) \end{aligned}

Considering the second equation in each of these pairs of equations, we see that scalars "float freely" within tensor products:

(αϕ)ψ=ϕ(αψ)=α(ϕψ).\bigl(\alpha \vert \phi \rangle\bigr) \otimes \vert \psi \rangle = \vert \phi \rangle \otimes \bigl(\alpha \vert \psi \rangle \bigr) = \alpha \bigl(\vert \phi \rangle \otimes \vert \psi \rangle \bigr).

There is therefore no ambiguity in simply writing αϕψ,\alpha\vert\phi\rangle\otimes\vert\psi\rangle, or alternatively αϕψ\alpha\vert\phi\rangle\vert\psi \rangle or αϕψ,\alpha\vert\phi\otimes\psi\rangle, to refer to this vector.

Independence and tensor products for three or more systems

The notions of independence and tensor products generalize straightforwardly to three or more systems. If X0,,Xn1\mathsf{X}_0,\ldots,\mathsf{X}_{n-1} are systems having classical state sets Σ0,,Σn1,\Sigma_0,\ldots,\Sigma_{n-1}, respectively, then a probabilistic state of the combined system (Xn1,,X0)(\mathsf{X}_{n-1},\ldots,\mathsf{X}_0) is a product state if the associated probability vector takes the form

ψ=ϕn1ϕ0 \vert \psi \rangle = \vert \phi_{n-1} \rangle \otimes \cdots \otimes \vert \phi_0 \rangle

for probability vectors ϕ0,,ϕn1\vert \phi_0 \rangle,\ldots,\vert \phi_{n-1}\rangle describing probabilistic states of X0,,Xn1.\mathsf{X}_0,\ldots,\mathsf{X}_{n-1}. Here, the definition of the tensor product generalizes in a natural way: the vector

ψ=ϕn1ϕ0\vert \psi \rangle = \vert \phi_{n-1} \rangle \otimes \cdots \otimes \vert \phi_0 \rangle

is defined by the equation

an1a0ψ=an1ϕn1a0ϕ0 \langle a_{n-1} \cdots a_0 \vert \psi \rangle = \langle a_{n-1} \vert \phi_{n-1} \rangle \cdots \langle a_0 \vert \phi_0 \rangle

being true for every a0Σ0,an1Σn1.a_0\in\Sigma_0, \ldots a_{n-1}\in\Sigma_{n-1}.

A different, but equivalent, way to define the tensor product of three or more vectors is recursively in terms of tensor products of two vectors:

ϕn1ϕ0=ϕn1(ϕn2ϕ0). \vert \phi_{n-1} \rangle \otimes \cdots \otimes \vert \phi_0 \rangle = \vert \phi_{n-1} \rangle \otimes \bigl( \vert \phi_{n-2} \rangle \otimes \cdots \otimes \vert \phi_0 \rangle \bigr).

Similar to the tensor product of just two vectors, the tensor product of three or more vectors is linear in each of the arguments individually, assuming that all other arguments are fixed. In this case it is said that the tensor product of three or more vectors is multilinear.

Like in the case of two systems, we could say that the systems X0,,Xn1\mathsf{X}_0,\ldots,\mathsf{X}_{n-1} are independent when they are in a product state, but the term mutually independent is more precise. There happen to be other notions of independence for three or more systems, such as pairwise independence, that are both interesting and important — but not in the context of this course.

Generalizing the observation earlier concerning tensor products of standard basis vectors, for any positive integer nn and any classical states a0,,an1,a_0,\ldots,a_{n-1}, we have

an1a0=an1a0.\vert a_{n-1} \rangle \otimes \cdots \otimes \vert a_0 \rangle = \vert a_{n-1} \cdots a_0 \rangle.

Measurements of probabilistic states

Now let us move on to measurements of probabilistic states of multiple systems. By choosing to view multiple systems together as single systems, we immediately obtain a specification of how measurements must work for multiple systems — provided that all of the systems are measured.

For example, if the probabilistic state of two bits (X,Y)(\mathsf{X},\mathsf{Y}) is described by the probability vector

1200+1211, \frac{1}{2} \vert 00 \rangle + \frac{1}{2} \vert 11 \rangle,

then the outcome 0000 — meaning 00 for the measurement of X\mathsf{X} and 00 for the measurement of Y\mathsf{Y} — is obtained with probability 1/21/2 and the outcome 1111 is also obtained with probability 1/2.1/2. In each case we update the probability vector description of our knowledge accordingly, so that the probabilistic state becomes 00|00\rangle or 11,|11\rangle, respectively.

We could, however, choose to measure not every system, but instead just some of the systems. This will result in a measurement outcome for each system that gets measured, and will also (in general) affect our knowledge of the remaining systems that we didn't measure.

To explain how this works, we'll focus on the case of two systems, one of which is measured. The more general situation — in which some proper subset of three or more systems is measured — effectively reduces to the case of two systems when we view the systems that are measured collectively as if they form one system and the systems that are not measured as if they form a second system.

To be precise, let's suppose that X\mathsf{X} and Y\mathsf{Y} are systems whose classical state sets are Σ\Sigma and Γ,\Gamma, respectively, and that the two systems together are in some probabilistic state. We'll consider what happens when we measure just X\mathsf{X} and do nothing to Y.\mathsf{Y}. The situation where just Y\mathsf{Y} is measured and nothing happens to X\mathsf{X} is handled symmetrically.

First, we know that the probability to observe a particular classical state aΣa\in\Sigma when just X\mathsf{X} is measured must be consistent with the probabilities we would obtain under the assumption that Y\mathsf{Y} was also measured. That is, we must have

Pr(X=a)=bΓPr((X,Y)=(a,b)). \operatorname{Pr}(\mathsf{X} = a) = \sum_{b\in\Gamma} \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (a,b) \bigr).

This is the formula for the so-called reduced (or marginal) probabilistic state of X\mathsf{X} alone.

This formula makes perfect sense at an intuitive level, in the sense that something very strange would have to happen for it to be wrong. If it were wrong, that would mean that measuring Y\mathsf{Y} could somehow influence the probabilities associated with different outcomes of the measurement of X,\mathsf{X}, irrespective of the actual outcome of the measurement of Y.\mathsf{Y}. If Y\mathsf{Y} happened to be in a distant location, such as somewhere in another galaxy for instance, this would allow for faster-than-light signaling — which we reject based on our understanding of physics. Another way to understand this comes from the interpretation of probability as reflecting a degree of belief. The mere fact that someone else might decide to look at Y\mathsf{Y} cannot change the classical state of X,\mathsf{X}, so without any information about what they did or didn't see, one's beliefs about the state of X\mathsf{X} should not change as a result.

Now, given the assumption that only X\mathsf{X} is measured and Y\mathsf{Y} is not, there may still exist uncertainty about the classical state of Y.\mathsf{Y}. For this reason, rather than updating our description of the probabilistic state of (X,Y)(\mathsf{X},\mathsf{Y}) to ab\vert ab\rangle for some selection of aΣa\in\Sigma and bΓ,b\in\Gamma, we must update our description so that this uncertainty about Y\mathsf{Y} is properly reflected.

The following conditional probability formula reflects this uncertainty.

Pr(Y=bX=a)=Pr((X,Y)=(a,b))Pr(X=a) \operatorname{Pr}(\mathsf{Y} = b \,\vert\, \mathsf{X} = a) = \frac{ \operatorname{Pr}\bigl((\mathsf{X},\mathsf{Y}) = (a,b)\bigr) }{ \operatorname{Pr}(\mathsf{X} = a) }

Here, the expression Pr(Y=bX=a)\operatorname{Pr}(\mathsf{Y} = b \,\vert\, \mathsf{X} = a) denotes the probability that Y=b\mathsf{Y} = b conditioned on (or given that) X=a.\mathsf{X} = a. Technically speaking, this expression only makes sense if Pr(X=a)\operatorname{Pr}(\mathsf{X}=a) is nonzero, for if Pr(X=a)=0,\operatorname{Pr}(\mathsf{X}=a) = 0, then we're dividing by zero and we obtain indeterminate form 00.\frac{0}{0}. This is not a problem, though, because if the probability associated with aa is zero, then we'll never obtain aa as an outcome of a measurement of X,\mathsf{X}, so we don't need to be concerned with this possibility.

To express these formulas in terms of probability vectors, consider a probability vector ψ\vert \psi \rangle describing a joint probabilistic state of (X,Y).(\mathsf{X},\mathsf{Y}).

ψ=(a,b)Σ×Γpabab \vert\psi\rangle = \sum_{(a,b)\in\Sigma\times\Gamma} p_{ab} \vert ab\rangle

Measuring X\mathsf{X} alone yields each possible outcome aΣa\in\Sigma with probability

Pr(X=a)=cΓpac. \operatorname{Pr}(\mathsf{X} = a) = \sum_{c\in\Gamma} p_{ac}.

The vector representing the probabilistic state of X\mathsf{X} alone is therefore given by

aΣ(cΓpac)a. \sum_{a\in\Sigma} \biggl(\sum_{c\in\Gamma} p_{ac}\biggr) \vert a\rangle.

Having obtained a particular outcome aΣa\in\Sigma of the measurement of X,\mathsf{X}, the probabilistic state of Y\mathsf{Y} is updated according to the formula for conditional probabilities, so that it is represented by this probability vector:

πa=bΓpabbcΓpac. \vert \pi_a \rangle = \frac{\sum_{b\in\Gamma}p_{ab}\vert b\rangle}{\sum_{c\in\Gamma} p_{ac}}.

In the event that the measurement of X\mathsf{X} resulted in the classical state a,a, we therefore update our description of the probabilistic state of the joint system (X,Y)(\mathsf{X},\mathsf{Y}) to aπa.\vert a\rangle \otimes \vert\pi_a\rangle.

One way to think about this definition of πa\vert\pi_a\rangle is to see it as a normalization of the vector bΓpabb,\sum_{b\in\Gamma} p_{ab} \vert b\rangle, where we divide by the sum of the entries in this vector to obtain a probability vector. This normalization effectively accounts for a conditioning on the event that the measurement of X\mathsf{X} has resulted in the outcome a.a.

For a specific example, suppose that classical state set of X\mathsf{X} is Σ={0,1},\Sigma = \{0,1\}, the classical state set of Y\mathsf{Y} is Γ={1,2,3},\Gamma = \{1,2,3\}, and the probabilistic state of (X,Y)(\mathsf{X},\mathsf{Y}) is

ψ=120,1+1120,3+1121,1+161,2+161,3. \vert \psi \rangle = \frac{1}{2} \vert 0,1 \rangle + \frac{1}{12} \vert 0,3 \rangle + \frac{1}{12} \vert 1,1 \rangle + \frac{1}{6} \vert 1,2 \rangle + \frac{1}{6} \vert 1,3 \rangle.

Our goal will be to determine the probabilities of the two possible outcomes (00 and 11), and to calculate what the resulting probabilistic state of Y\mathsf{Y} is for the two outcomes, assuming the system X\mathsf{X} is measured.

Using the bilinearity of the tensor product, and specifically the fact that it is linear in the second argument, we may rewrite the vector ψ\vert \psi \rangle as follows:

ψ=0(121+1123)+1(1121+162+163). \vert \psi \rangle = \vert 0\rangle \otimes \biggl( \frac{1}{2} \vert 1 \rangle + \frac{1}{12} \vert 3 \rangle\biggr) + \vert 1\rangle \otimes \biggl( \frac{1}{12} \vert 1 \rangle + \frac{1}{6} \vert 2\rangle + \frac{1}{6} \vert 3 \rangle\biggr).

In words, what we've done is to isolate the distinct standard basis vectors for the first system (i.e., the one being measured), tensoring each with the linear combination of standard basis vectors for the second system we get by picking out the entries of the original vector that are consistent with the corresponding classical state of the first system. A moment's thought reveals that this is always possible, regardless of what vector we started with.

Having expressed our probability vector in this way, the effects of measuring the first system become easy to analyze. The probabilities of the two outcomes can be obtained by summing the probabilities in parentheses.

Pr(X=0)=12+112=712Pr(X=1)=112+16+16=512 \begin{aligned} \operatorname{Pr}(\mathsf{X} = 0) & = \frac{1}{2} + \frac{1}{12} = \frac{7}{12}\\[3mm] \operatorname{Pr}(\mathsf{X} = 1) & = \frac{1}{12} + \frac{1}{6} + \frac{1}{6} = \frac{5}{12} \end{aligned}

These probabilities sum to one, as expected — but this is a useful check on our calculations.

And now, the probabilistic state of Y\mathsf{Y} conditioned on each possible outcome can be inferred by normalizing the vectors in parentheses. That is, we divide these vectors by the associated probabilities we just calculated, so that they become probability vectors.

Thus, conditioned on X\mathsf{X} being 0,0, the probabilistic state of Y\mathsf{Y} becomes

121+1123712=671+173, \frac{\frac{1}{2} \vert 1 \rangle + \frac{1}{12} \vert 3 \rangle}{\frac{7}{12}} = \frac{6}{7} \vert 1 \rangle + \frac{1}{7} \vert 3 \rangle,

and conditioned on the measurement of X\mathsf{X} being 1,1, the probabilistic state of Y\mathsf{Y} becomes

1121+162+163512=151+252+253. \frac{\frac{1}{12} \vert 1 \rangle + \frac{1}{6} \vert 2\rangle + \frac{1}{6} \vert 3 \rangle}{\frac{5}{12}} = \frac{1}{5} \vert 1 \rangle + \frac{2}{5} \vert 2 \rangle + \frac{2}{5} \vert 3 \rangle.

Operations on probabilistic states

To conclude this discussion of classical information for multiple systems, we'll consider operations on multiple systems in probabilistic states. Following the same idea as before, we can view multiple systems collectively as single, compound systems, and then look to the previous lesson to see how this works.

Returning to the typical set-up where we have two systems X\mathsf{X} and Y,\mathsf{Y}, let us consider classical operations on the compound system (X,Y).(\mathsf{X},\mathsf{Y}). Based on the previous lesson and the discussion above, we conclude that any such operation is represented by a stochastic matrix whose rows and columns are indexed by the Cartesian product Σ×Γ.\Sigma\times\Gamma.

For example, suppose that X\mathsf{X} and Y\mathsf{Y} are bits, and consider an operation with the following description.

If X=1,\mathsf{X} = 1, then perform a NOT operation on Y.\mathsf{Y}.
Otherwise do nothing.

This is a deterministic operation known as a controlled-NOT operation, where X\mathsf{X} is the control bit that determines whether or not a NOT operation should be applied to the target bit Y.\mathsf{Y}. Here is the matrix representation of this operation:

(1000010000010010).\begin{pmatrix} 1 & 0 & 0 & 0\\[2mm] 0 & 1 & 0 & 0\\[2mm] 0 & 0 & 0 & 1\\[2mm] 0 & 0 & 1 & 0 \end{pmatrix}.

Its action on standard basis states is as follows.

0000010110111110\begin{aligned} \vert 00 \rangle & \mapsto \vert 00 \rangle\\ \vert 01 \rangle & \mapsto \vert 01 \rangle\\ \vert 10 \rangle & \mapsto \vert 11 \rangle\\ \vert 11 \rangle & \mapsto \vert 10 \rangle \end{aligned}

If we were to exchange the roles of X\mathsf{X} and Y,\mathsf{Y}, taking Y\mathsf{Y} to be the control bit and X\mathsf{X} to be the target bit, then the matrix representation of the operation would become

(1000000100100100)\begin{pmatrix} 1 & 0 & 0 & 0\\[2mm] 0 & 0 & 0 & 1\\[2mm] 0 & 0 & 1 & 0\\[2mm] 0 & 1 & 0 & 0 \end{pmatrix}

and its action on standard basis states would be like this:

0000011110101101\begin{aligned} \vert 00 \rangle & \mapsto \vert 00 \rangle\\ \vert 01 \rangle & \mapsto \vert 11 \rangle\\ \vert 10 \rangle & \mapsto \vert 10 \rangle\\ \vert 11 \rangle & \mapsto \vert 01 \rangle \end{aligned}

Another example is the operation having this description:

Perform one of the following two operations, each with probability 1/2:1/2:

  1. Set Y\mathsf{Y} to be equal to X.\mathsf{X}.
  2. Set X\mathsf{X} to be equal to Y.\mathsf{Y}.

The matrix representation of this operation is as follows:

(11212000000000012121)=12(1100000000000011)+12(1010000000000101).\begin{pmatrix} 1 & \frac{1}{2} & \frac{1}{2} & 0\\[2mm] 0 & 0 & 0 & 0\\[2mm] 0 & 0 & 0 & 0\\[2mm] 0 & \frac{1}{2} & \frac{1}{2} & 1 \end{pmatrix} = \frac{1}{2} \begin{pmatrix} 1 & 1 & 0 & 0\\[2mm] 0 & 0 & 0 & 0\\[2mm] 0 & 0 & 0 & 0\\[2mm] 0 & 0 & 1 & 1 \end{pmatrix} + \frac{1}{2} \begin{pmatrix} 1 & 0 & 1 & 0\\[2mm] 0 & 0 & 0 & 0\\[2mm] 0 & 0 & 0 & 0\\[2mm] 0 & 1 & 0 & 1 \end{pmatrix}.

The action of this operation on standard basis vectors is as follows:

0000011200+1211101200+12111111\begin{aligned} \vert 00 \rangle & \mapsto \vert 00 \rangle\\[1mm] \vert 01 \rangle & \mapsto \frac{1}{2} \vert 00 \rangle + \frac{1}{2}\vert 11\rangle\\[3mm] \vert 10 \rangle & \mapsto \frac{1}{2} \vert 00 \rangle + \frac{1}{2}\vert 11\rangle\\[2mm] \vert 11 \rangle & \mapsto \vert 11 \rangle \end{aligned}

In these examples, we are simply viewing two systems together as a single system and proceeding as in the previous lesson.

The same thing can be done for any number of systems. For example, imagine that we have three bits, and we increment the three bits modulo 88 — meaning that we think about the three bits as encoding a number between 00 and 77 using binary notation, add 1,1, and then take the remainder after dividing by 8.8. One way to express this operation is like this:

001000+010001+011010+100011+101100+110101+111110+000111.\begin{aligned} & \vert 001 \rangle \langle 000 \vert + \vert 010 \rangle \langle 001 \vert + \vert 011 \rangle \langle 010 \vert + \vert 100 \rangle \langle 011 \vert\\[1mm] & \quad + \vert 101 \rangle \langle 100 \vert + \vert 110 \rangle \langle 101 \vert + \vert 111 \rangle \langle 110 \vert + \vert 000 \rangle \langle 111 \vert. \end{aligned}

Another way to express it is as

k=07(k+1)mod8k,\sum_{k = 0}^{7} \vert (k+1) \bmod 8 \rangle \langle k \vert,

assuming we've agreed that numbers from 00 to 77 inside of kets refer to the three-bit binary encodings of those numbers. A third option is to express this operation as a matrix.

(0000000110000000010000000010000000010000000010000000010000000010)\begin{pmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1\\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \end{pmatrix}

Independent operations

Now suppose that we have multiple systems and we independently perform different operations on the systems separately.

For example, taking our usual set-up of two systems X\mathsf{X} and Y\mathsf{Y} having classical state sets Σ\Sigma and Γ,\Gamma, respectively, let us suppose that we perform one operation on X\mathsf{X} and, completely independently, another operation on Y.\mathsf{Y}. As we know from the previous lesson, these operations are represented by stochastic matrices — and to be precise, let us say that the operation on X\mathsf{X} is represented by the matrix MM and the operation on Y\mathsf{Y} is represented by the matrix N.N. Thus, the rows and columns of MM have indices that are placed in correspondence with the elements of Σ\Sigma and, likewise, the rows and columns of NN correspond to the elements of Γ.\Gamma.

A natural question to ask is this: if we view X\mathsf{X} and Y\mathsf{Y} together as a single, compound system (X,Y),(\mathsf{X},\mathsf{Y}), what is the matrix that represents the combined action of the two operations on this compound system? To answer this question we must first introduce tensor products of matrices, which are similar to tensor products of vectors and are defined analogously.

Tensor products of matrices

The tensor product MNM\otimes N of the matrices

M=a,bΣαabab M = \sum_{a,b\in\Sigma} \alpha_{ab} \vert a\rangle \langle b\vert

and

N=c,dΓβcdcd N = \sum_{c,d\in\Gamma} \beta_{cd} \vert c\rangle \langle d\vert

is the matrix

MN=a,bΣc,dΓαabβcdacbd M \otimes N = \sum_{a,b\in\Sigma} \sum_{c,d\in\Gamma} \alpha_{ab} \beta_{cd} \vert ac \rangle \langle bd \vert

Equivalently, the tensor product of MM and NN is defined by the equation

acMNbd=aMbcNd\langle ac \vert M \otimes N \vert bd\rangle = \langle a \vert M \vert b\rangle \langle c \vert N \vert d\rangle

being true for every selection of a,bΣa,b\in\Sigma and c,dΓ.c,d\in\Gamma.

An alternative, but equivalent, way to describe MNM\otimes N is that it is the unique matrix that satisfies the equation

(MN)(ϕψ)=(Mϕ)(Nψ) (M \otimes N) \bigl( \vert \phi \rangle \otimes \vert \psi \rangle \bigr) = \bigl(M \vert\phi\rangle\bigr) \otimes \bigl(N \vert\psi\rangle\bigr)

for every possible choice of vectors ϕ\vert\phi\rangle and ψ,\vert\psi\rangle, assuming that the indices of ϕ\vert\phi\rangle correspond to the elements of Σ\Sigma and the indices of ψ\vert\psi\rangle correspond to Γ.\Gamma.

Following the convention described previously for ordering the elements of Cartesian products, we can also write the tensor product of two matrices explicitly as follows:

(α11α1mαm1αmm)(β11β1kβk1βkk)=(α11β11α11β1kα1mβ11α1mβ1kα11βk1α11βkkα1mβk1α1mβkkαm1β11αm1β1kαmmβ11αmmβ1kαm1βk1αm1βkkαmmβk1αmmβkk)\begin{gathered} \begin{pmatrix} \alpha_{11} & \cdots & \alpha_{1m} \\ \vdots & \ddots & \vdots \\ \alpha_{m1} & \cdots & \alpha_{mm} \end{pmatrix} \otimes \begin{pmatrix} \beta_{11} & \cdots & \beta_{1k} \\ \vdots & \ddots & \vdots\\ \beta_{k1} & \cdots & \beta_{kk} \end{pmatrix} \hspace{6cm}\\[8mm] \hspace{1cm} = \begin{pmatrix} \alpha_{11}\beta_{11} & \cdots & \alpha_{11}\beta_{1k} & & \alpha_{1m}\beta_{11} & \cdots & \alpha_{1m}\beta_{1k} \\ \vdots & \ddots & \vdots & \hspace{2mm}\cdots\hspace{2mm} & \vdots & \ddots & \vdots \\ \alpha_{11}\beta_{k1} & \cdots & \alpha_{11}\beta_{kk} & & \alpha_{1m}\beta_{k1} & \cdots & \alpha_{1m}\beta_{kk} \\[2mm] & \vdots & & \ddots & & \vdots & \\[2mm] \alpha_{m1}\beta_{11} & \cdots & \alpha_{m1}\beta_{1k} & & \alpha_{mm}\beta_{11} & \cdots & \alpha_{mm}\beta_{1k} \\ \vdots & \ddots & \vdots & \hspace{2mm}\cdots\hspace{2mm} & \vdots & \ddots & \vdots \\ \alpha_{m1}\beta_{k1} & \cdots & \alpha_{m1}\beta_{kk} & & \alpha_{mm}\beta_{k1} & \cdots & \alpha_{mm}\beta_{kk} \end{pmatrix} \end{gathered}

Tensor products of three or more matrices are defined in an analogous way. If M0,,Mn1M_0, \ldots, M_{n-1} are matrices whose indices correspond to classical state sets Σ0,,Σn1,\Sigma_0,\ldots,\Sigma_{n-1}, then the tensor product Mn1M0M_{n-1}\otimes\cdots\otimes M_0 is defined by the condition that

an1a0Mn1M0bn1b0=an1Mn1bn1a0M0b0\langle a_{n-1}\cdots a_0 \vert M_{n-1}\otimes\cdots\otimes M_0 \vert b_{n-1}\cdots b_0\rangle = \langle a_{n-1} \vert M_{n-1} \vert b_{n-1} \rangle \cdots\langle a_0 \vert M_0 \vert b_0 \rangle

for every choice of classical states a0,b0Σ0,,an1,bn1Σn1.a_0,b_0\in\Sigma_0,\ldots,a_{n-1},b_{n-1}\in\Sigma_{n-1}. Alternatively, tensor products of three or more matrices can be defined recursively, in terms of tensor products of two matrices, similar to what we observed for vectors.

The tensor product of matrices is sometimes said to be multiplicative because the equation

(Mn1M0)(Nn1N0)=(Mn1Nn1)(M0N0) (M_{n-1}\otimes\cdots\otimes M_0)(N_{n-1}\otimes\cdots\otimes N_0) = (M_{n-1} N_{n-1})\otimes\cdots\otimes (M_0 N_0)

is always true, for any choice of matrices M0,,Mn1M_0,\ldots,M_{n-1} and N0,Nn1,N_0\ldots,N_{n-1}, provided that the products M0N0,,Mn1Nn1M_0 N_0, \ldots, M_{n-1} N_{n-1} make sense.

Independent operations (continued)

We can now answer the question asked previously: if MM is a probabilistic operation on X,\mathsf{X}, NN is a probabilistic operation on Y,\mathsf{Y}, and the two operations are performed independently, then the resulting operation on the compound system (X,Y)(\mathsf{X},\mathsf{Y}) is the tensor product MN.M\otimes N.

So, for both probabilistic states and probabilistic operations, tensor products represent independence. If we have two systems X\mathsf{X} and Y\mathsf{Y} that are independently in the probabilistic states ϕ\vert\phi\rangle and π,\vert\pi\rangle, then the compound system (X,Y)(\mathsf{X},\mathsf{Y}) is in the probabilistic state ϕπ;\vert\phi\rangle\otimes\vert\pi\rangle; and if we apply probabilistic operations MM and NN to the two systems independently, then the resulting action on the compound system (X,Y)(\mathsf{X},\mathsf{Y}) is described by the operation MN.M\otimes N.

Let's take a look at an example, which recalls a probabilistic operation on a single bit from the previous lesson: if the classical state of the bit is 0,0, it is left alone; and if the classical state of the bit is 1,1, it is flipped to 0 with probability 1/2.1/2. We observed that this operation is represented by the matrix

(112012). \begin{pmatrix} 1 & \frac{1}{2}\\[1mm] 0 & \frac{1}{2} \end{pmatrix}.

If this operation is performed on a bit X,\mathsf{X}, and a NOT operation is (independently) performed on a second bit Y,\mathsf{Y}, then the joint operation on the compound system (X,Y)(\mathsf{X},\mathsf{Y}) has the matrix representation

(112012)(0110)=(01012101200001200120). \begin{pmatrix} 1 & \frac{1}{2}\\[1mm] 0 & \frac{1}{2} \end{pmatrix} \otimes \begin{pmatrix} 0 & 1\\[1mm] 1 & 0 \end{pmatrix} = \begin{pmatrix} 0 & 1 & 0 & \frac{1}{2} \\[1mm] 1 & 0 & \frac{1}{2} & 0 \\[1mm] 0 & 0 & 0 & \frac{1}{2} \\[1mm] 0 & 0 & \frac{1}{2} & 0 \end{pmatrix}.

By inspection, we see that this is a stochastic matrix. This will always be the case: the tensor product of two or more stochastic matrices is always stochastic.

A common situation that we encounter is one in which one operation is performed on one system and nothing is done to another. In such a case, exactly the same prescription is followed, bearing in mind that doing nothing is represented by the identity matrix. For example, resetting the bit X\mathsf{X} to the 00 state and doing nothing to Y\mathsf{Y} yields the probabilistic (and in fact deterministic) operation on (X,Y)(\mathsf{X},\mathsf{Y}) represented by the matrix

(1100)(1001)=(1010010100000000). \begin{pmatrix} 1 & 1\\[1mm] 0 & 0 \end{pmatrix} \otimes \begin{pmatrix} 1 & 0\\[1mm] 0 & 1 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 1 & 0 \\[1mm] 0 & 1 & 0 & 1 \\[1mm] 0 & 0 & 0 & 0 \\[1mm] 0 & 0 & 0 & 0 \end{pmatrix}.
Was this page helpful?
Report a bug or request content on GitHub.