by Aditya Mukerjee
When I applied to college, I was intent on studying physics. Quantum mechanics in particular fascinated me. It danced a fine line between science and poetry, with a dazzling constellation of paradoxes and counterintuitive contradictions. It explains how our universe is built on the most fundamental, “picoscopic” level, while only revealing these secrets as riddles.
Instead, I ended up studying in a different field: statistics. I was drawn to statistics because I saw it as a powerful tool. While physics explains how our world works, statistics explains the process of how we learn how our world works.
In that power that statistics holds, there exists an elegance as well, though I didn’t know it at first. It was several classes before I realized that statistics, like quantum mechanics, is its own fractal of contradictions, each more bizarre than the last.
“The trouble with random numbers,” my professor said one day, almost as an aside, “is that they’re not random, and they’re not numbers.” I had never really noticed it until he mentioned this, but I realized that we had been using random numbers in our classes all this time without really ever clarifying what they were.
The reason for this is historical—the field of statistics evolved very differently from calculus. Some notions of what randomness and unpredictability were have been around for a few millennia; Senet, an ancient Egyptian game of dice, is over 5,000 years old. By the time many technical terms like ‘probability’ and ‘random’ were rigorously defined by statisticians, they had already entered household usage with meanings similar to the ones they hold today. As a result, many professors invoke these terms without remembering to check the basic assumption: do we actually know what these words mean?
Why Random Variables Are Not Variables
Cryptography is built on randomness, and understanding randomness is fundamental to understanding how cryptography works. Randomness alone isn’t sufficient—randomness doesn’t inherently make cryptography secure. Yet, a lack of randomness does always mean that cryptography is insecure.
Debian developers know this firsthand. Ten years ago, it was discovered that Debian generated ineffective SSL keys—essentially nullifying the security of all of their users’ systems—for a period of almost two years. This security disaster turned out to be caused by an incorrect method that was being used for random number generation. (For concerned Linux users reading this, this bug has long been fixed, and if you’re particularly concerned, it’s even possible to check your key against the list of keys that are now known to be weak.)
Generating numbers randomly is a complex topic. We don’t have to search hard on the Internet to find heated, back-and-forth debates between cryptographers about the proper way to obtain randomly-generated numbers that are used as the input to cryptographic functions. But non-cryptographers need not feel left out of the fun! A few years ago, one wry user posted the following message on the Arch Linux forums:
Can anybody tell me if this number is random or not?:
A lively debate followed, with answers ranging from ‘that depends on how it was generated’ to ’14 is a random number, but not nearly as random as 17, which is the most random number’. My favorite would be:
$ cat /dev/urandom | grep 14
Binary file (standard input) matches
…looks fine to me.
The discussion attracted enough attention from outside the forum that the original user even started selling t-shirts that said, ’14 is a random number’. He also created t-shirts that said ’14 is not a random number’ so everyone would be able to participate, regardless of which side they took.
In trying to figure out the correct answer to this question, it helps to understand what variables are. To mathematicians, a variable represents an unknown number in an expression. If Alice is three inches taller than Bob, we can describe Alice’s height y in terms of Bob’s height x as
y = x + 3. We don’t know how tall either Alice or Bob is, but if we discover either one’s height, we can figure out the other’s. Once we know what x is, we know what y is. But more importantly—and this shouldn’t be taken for granted—once we know what x is, we know what x is.
You might think that’s tautological—how could that statement ever not be true? Random variables are elusive; we can never actually observe one directly. If we have a random variable X, there is no way to actually determine its value. We can make an observation—let’s call this x̂₁—but that’s not actually the random variable itself. Since we can never actually observe the random variable X, instead we look at its ‘shadows’ (x̂₁, x̂₂, etc.) and make some guesses (called inferences) about what X actually is. No matter what we do, we can never actually observe the random variable itself.
It would make algebra quite difficult if variables changed their value every time you looked at them. Fortunately, it turns out we already have a great name for something-whose-value-can-change-every-time-we-evaluate-it: we call that a function.
Instead of thinking of X as a single variable, we think of X as a function that generates shadow numbers every time we look at it. Those numbers aren’t random either. If Alice and Bob decide to flip a coin to see who gets the last slice of cake, once Alice flips the coin and sees that it lands heads-up, she can show it to Bob and be confident that he will see that it landed heads-up once he looks at it. (Of course, whether or not he admits it depends on whether “heads” means he won or lost the flip). After a value x̂ᵢ has been observed, it never changes, even though the function—in this case, the coin—may generate different values in the future.
With this reasoning, we can safely conclude that 14 is not a random number, because randomness is a property of functions, not numbers. The number 14 may have been generated by a random function, just like the coin flip was generated by a random process, or it could have been carefully chosen, just as it would have if Alice “rigged” the coin flip by changing the outcome when Bob was looking the other way. We don’t know; but either way, 14 is not a random number.
So can we just start calling “random variables” “random functions” instead and leave it at that? Unfortunately, the answer is no.
Why Random Variables Are Not Random
Until now, I’ve been using the word ‘random’ as if it were synonymous with ‘unpredictable’. And colloquially, we use the word ‘random’ to describe something that’s unpredictable. But paradoxically, true randomness is quite predictable! Randomness isn’t about being unpredictable; it’s about describing uncertainty in a measured way.
If Alice and Bob flip a fair coin and call heads and tails respectively, I may not know who will win the coin flip. But I do know that Alice and Bob have an equal chance of winning, and that it’s almost certain that either Alice or Bob will win. (Here’s another fun paradox with the two words ‘almost certainly’—if you ever hear a statistician say that something is “almost certain”, they really mean that it has a 100% chance of happening. The reason for this is beyond the scope of this article, but if you’re curious, it’s related to the reason that the number 9.999… is exactly equal to 1.)
We use the word ‘random’ colloquially to say that we don’t know anything about a situation (“It was entirely random”). But in reality, randomness means that we do know something. If we roll a normal six-sided die, we know the number that comes up has to be between 1 and 6. We don’t know exactly which of those six values we will roll, but we are able to describe with precision the exact set of possible outcomes and the specific probability of each outcome taking place. When you think about it, this makes the process of rolling a die rather boring. No single roll can ever really surprise us. (This also makes statisticians no fun to play cards with, because we will spend the entire game calculating the probability of drawing each hand.)
Fundamentally, randomness is repeatable—just not with absolute certainty. If Alice rolls a 5 on a fair, six-sided die, it’s possible that Bob will be able to repeat this with his next die roll as well, but he can’t guarantee it. It’s precisely this gray area of certainty-within-uncertainty that we refer to when we use the word ‘random’. If Bob were somehow able to reliably repeat Alice’s roll with a fair, six-sided die, we would not say the process is deterministic, not random.
This is a problem for computer scientists; if being able to specify a deterministic, repeatable process for generating numbers means that the process is not random, that suggests that computers are fundamentally incapable of randomly generating numbers. At the end of the day, all computer programs are just fancy Turing machines, which means that they execute a deterministic set of steps.
And as it turns out, that’s true! Computers are absolutely, theoretically incapable of random number generation. True randomness only exists at the quantum level, with quantum phenomena. (Even then, this was a hotly debated question for a long time, with Albert Einstein famously arguing against the existence of randomness and eventually being proven incorrect.)
But earlier, I said that we can’t have secure cryptography without randomness. Does this mean that we can’t have secure cryptographic software? Cryptographers get around this contradiction with some sleight-of-hand, essentially making the argument, “It’s only lying if you can tell the difference.” The ethics of this line of reasoning when applied to the rest of life notwithstanding, it is a solid enough principle for cryptography to work.
It turns out that cryptography doesn’t need numbers that were actually generated by a truly random function; it only needs the numbers to look like they were. The technical term for this is ‘computationally indistinguishable’, and it’s usually phrased as a game. Alice has two machines: one which generates a list of numbers pseudorandomly using a computer, and one which reads a list of truly randomly generated numbers (obtained from quantum experiments). She shows them to Bob, but doesn’t tell him which one is which. Bob reads the output of both machines and, after some time, is asked to identify the machine that contains the true source of randomness. Bob may perform any polynomial-time calculation he chooses on the numbers to help him tell the machines apart.
If Bob cannot reliably identify the two machines, they are said to be computationally indistinguishable. In this case, it is safe to pretend that the pseudorandom number generator (PRNG) is a true random number generator (RNG), and carry on with the rest of our work. There’s a little calculus involved in smoothing out the details—namely, figuring out how many numbers Bob gets to read before he has to guess—but in the end, it turns out that this is as secure as theoretically possible… without using quantum cryptography.
Quantum computing changes the game of indistinguishability. If Bob is given access to a quantum computer, he may be able to distinguish between the PRNG and the RNG when it would otherwise be impossible. The trick lies in the restriction that Bob is only allowed to read a small amount of the output from the two machines to compare them. (Formally, we say that he is allowed to read ‘polynomially many’ bits at a time, as opposed to ‘exponentially many’. The difference here between polynomial size and exponential size is analogous to their significance in the problem P≟NP.)
Much in the way that Schrödinger’s cat is simultaneously alive and dead, quantum bits (qubits) are simultaneously 1 and 0. This paradox means that Bob can perform a single calculation on a single 8-bit quantum number and in doing so, he can also perform the equivalent of 256 classical (non-quantum) calculations—one on every 8-bit number. Because he can perform operations exponentially faster, he can distinguish between the pseudorandom number generator and the random number generator with exponentially less input.
Quantum Computing and the Future of Cryptography
Quantum computing is still mostly theoretical, but it is undoubtedly a powerful force on the horizon. The security implications of quantum computing are both inspiring and terrifying. A large-scale quantum computer would be capable of breaking nearly all of the cryptography used today. Thanks to Edward Snowden, we now know that the NSA has spent nearly $100 million on developing a quantum computer. (This program is aptly named “Penetrating Hard Targets”.)
At the same time, quantum computing also presents opportunities for cryptography on a new scale—cryptography that is protected not only by the laws of statistics, but also by the laws of physics. Quantum computing could ensure that adversaries are unable to eavesdrop on conversations without being detected. One of the earliest quantum computer scientists even proposed a system for “quantum money”—in other words, money which would be impossible to counterfeit.
I mentioned earlier that quantum phenomena are the only source of true randomness in nature. This is not directly responsible for the cryptographic implications of quantum cryptography, but they are related. And in a way, they form a beautiful symmetry: statistics, the study of randomness, cannot be captured by normal physical models. Only with quantum mechanical models can we properly construct physical representations of the art of randomness—and only in this quantum setting can the hidden cryptographic power of statistics demonstrate its full potential.
Aditya is an entrepreneur, investor, and engineer based in New York City. He studied statistics at Columbia and computer science at Cornell, and spends his free time playing German-style board games and listening to embarrassing music.