A random number is a number generated by a process, whose outcome is completely unpredictable, and which cannot be subsequently reliably reproduced.

Although random numbers are required in many applications — from cryptosystems to gaming — their generation is often overlooked. In order to guarantee absolutely random numbers, RNGs must not be vulnerable to prediction or bias, and thus dictated by true randomness.

The problem arises from how the keys are created. Most systems use a process called ‘Pseudo Random Number Generation’ (PRNG). Why ‘pseudo’? Because, due to their deterministic nature, conventional computers cannot generate true randomness. Information from inputs such as mouse movements, keyboard pressures, disc interrupts or system timers are all placed into a ‘pool’ of numbers, from which a ‘seed’ is picked. This ‘seed’ is then used in the PRNG which generates the keys.

The better solution is to use hardware random number generation. However, not all hardware random number generators are equally good. Many hardware random number generators still rely on classical physics to produce what looks like a random stream of bits. In reality, determinism is hidden behind complexity for RNGs that do not exploit quantum physics.Contrary to classical physics, quantum physics is fundamentally random. It is the only theory within the fabric of modern physics that integrates randomness. Quantum Random Number Generators (QRNG) use these quantum-random properties to generate truly random numbers.

In addition, QRNG have the advantage over conventional randomness sources of being invulnerable to environmental perturbations and of allowing live status verification.

An infinite sequence should not be compressible by any known or unknown technique.

In the case of a finite sequence of numbers, it is formally impossible to verify whether it is random or not. It is only possible to check that it shares the statistical properties of a random sequence – like the equi-probability of all numbers – but this a difficult task.

To illustrate this, let us for example consider a binary random number generator producing sequences of ten bits. Although it is exactly as likely as any other ten bits sequences, 1 1 1 1 1 1 1 1 1 1 does look less random than 0 1 1 0 1 0 1 0 0 0.

In order to cope with this difficulty, definitions have been proposed to characterize “practical” random number sequences. According to Knuth, a sequence of random numbers is a sequence of independent numbers with a specified distribution and a specified probability of falling in any given range of values. It is a sequence that has the same statistical properties as random bits, is unpredictable and cannot be reliably reproduced.

A concept that is present in both of these definitions and that must be emphasized is the fact that numbers in a random sequence must not be correlated. Knowing one of the numbers of a sequence must not help predicting the other ones. Whenever random numbers are mentioned in the rest of this paper, it will be assumed that they fulfil these “practical” definitions.

Computers are deterministic systems. given a certain input, a program will always produce the same output.

Because of this very fundamental property, it is impossible for a program to produce a sequence of random numbers. The sequence may have some of the properties of a random sequence, and thus pass some statistical randomness tests, but it is always possible to reproduce it.

As the sequences they produce look like random sequences, these generators are called pseudo- random number generators. It is however clear that they do not fulfil the definitions. Pseudo-random number generators consist of an algorithm into which some initial value – it is called the seed – is fed and which produces by iteration a sequence of pseudo-random numbers. Although their period can be made very long, the sequence produced by such a generator is always periodic. When working with large sets of random numbers or long sequences, it is important to verify that the period is large enough.

One of the properties of the sequences produced in this way is that, as soon as one element of the sequence is known, all the other elements of the sequence, both preceding and following, can be determined. It is immediately obvious that this property is especially critical when such numbers are used in cryptography for key generation. The sequences produced by a good pseudo-random generator initialized with an appropriate seed, pass most statistical tests. However, it is also important to realize that if the sequence considered is long enough, it will always fail at least some tests, because of periodicity.

The seed.

One important issue when using pseudo-random number generators is the choice of the seed value. If one does not want to continuously cycle through the same sequence, it is essential to change periodically the seed value. How should this seed value be chosen? Ideally, it should be random. As a pseudo-random generator is used, it is likely that no random number generator is available. It is a catch-22 situation.

A solution, which is not always satisfactory, is to use entropy gathering. System information – the clock, the time interval between keystrokes, etc. – is combined to produce a seed. This technique however requires extreme caution.

Same seed, same output.

A security problem encountered by Netscape with its browser in 1995 illustrates the risks associated with the use of pseudo-random numbers in cryptography. At the time, the web was full of promises for online commerce and Netscape had developed a protocol, called SSL and still in use today, to secure communications over the web.

Like in the case of other cryptographic protocols, the security of SSL crucially depends on the unpredictability of the key. The company implemented this protocol in its browser, but relied on a pseudo-random number generator for key generation. Two Berkeley graduate students reverse-engineered the code of the browser and revealed a serious security flaw. They noticed that the seed used by the pseudo-random number generator depended on the time of the day and some system information. They showed that it was relatively easy to guess these quantities, and thus to reduce the number of possible keys that one should try to crack the protocol.

This attack reduced the time for a brute force attack of the protocol from more than thirty hours to a few minutes, and as little as a few seconds in some special cases. This problem is also serious in other industries. For example, in 2014, a group of hackers managed to make millions by targeting and hacking slot machines from a well-known brand in the gaming industry. Casinos from US to Romania to Macau were the main victim of the scam.

It appeared that the hackers managed to figure out how to predict the slot machines’ PRNG behavior. After a prolonged observation and analysis of the slot machines’ behavior, they could identify thepattern of the PRNG’s algorithm that was used in the slot machines. This security flaw illustrates why pseudo-random number generators are usually considered inappropriate for high-security cryptographic or other applications.

In spite of the fact that they do not produce random numbers, such PRNG generators do have some advantages. First, their cost is virtually zero, as they can be implemented in software and numerous libraries are freely available. Second, their main disadvantage – namely that the sequences they produce are reproducible – can in certain cases represent an advantage. When using random numbers for scientific calculations, it is sometimes useful to be able to replay a sequence of numbers when debugging the simulation program.

This is one of the reasons – the other being the lack of good physical random numbers generators –why these generators are widely used in these applications.

However, it is important, even in this field, to remain careful when using pseudo-random numbers. They have been shown to cause artefacts in certain simulations. A good approach is to use pseudo- random numbers when debugging, and then resort to a physical generator to refine and confirm the simulation.

In applications where pseudo-random numbers are not appropriate, one must resort to using a physical random number generator. When using such a generator, it is essential to consider the physical process used as the randomness source.

This source can be either based on a process described by classical physics or by quantum physics. Classical physics is the set of theories developed by physicists hundreds of years ago, which describes macroscopic systems like falling coins. Quantum physics is a set of theories elaborated by physicists during the first half of the 20th century and describes microscopic systems like atoms or elementary particles. Some examples of generators based on each of these theories, along with their advantages, are presented below, after a brief discussion of biased random number sequences.

Bias.

A problem encountered with physical random number generators is their bias. A binary generator is said to be biased when the probability of one outcome is not equal to the probability of the other outcome.

Bias arises because of the difficulty to devise precisely balanced physical processes. It is however less of a problem than one might expect at first sight. There exists some post-processing algorithm that can be used to remove bias from a sequence or random numbers. The simplest of these unbiasing procedures was first proposed by Von Neumann. The random bits of a sequence are grouped in sub sequences of two bits. Whenever the two bits of a subsequence are equal, it is discarded. When the two bits are different and the subsequence starts with a 1, the subsequence is replaced by a 1. When it starts with a 0, it is replaced by a 0. After this procedure, the bias is removed from the sequence.

The cost of applying an unbiasing procedure to a sequence is that it is shortened. In the case of the Von Neumann procedure, the length of the unbiased sequence will be at most 25% of the length of the raw sequence.

We mentioned previously that randomness tests basically all amount to verifying whether the sequence can be compressed. An unbiasing procedure can be seen as a compression procedure. After its application, the bias is removed and no further compression is possible, guaranteeing that the sequence will pass the tests. Other unbiasing procedures exist. The one proposed by Peres, for example, is significantly more efficient than the Von Neumann procedure.

Deterministic complexity

Macroscopic processes described by classical physics can be used to generate random numbers. The most famous random number generator – coin tossing – indeed belongs to this class. However, it is very important to realize that classical physics is fundamentally deterministic. The evolution of a system described by classical physics can be predicted, assuming that the initial conditions are known.

In the case of a coin, a physicist knowing precisely its weight, its initial position, the force applied to it by the hand, the speed of the wind, as well as all the other relevant parameters, should in principle be able to predict the outcome of the throw. Why is that, that in practice this prediction is not possible? Coin tossing is a chaotic process. Chaos is a type of behavior observed in systems whose evolution exhibits extreme sensitivity to initial conditions.

Coin tossing is not the only physical system with chaotic evolution. Turbulences in a flow (turbulences in a lava lamp have been used to generate random numbers) or meteorological phenomena are good examples of chaotic systems. The evolution of these systems is so sensitive to initial conditions that it is simply not possible to determine them precisely enough to allow reliable prediction of future transformations. In spite of its popularity, coin tossing is clearly not very practical when many random events are required. Other examples of physical random number generators based on chaotic processes include the monitoring of an electric noise current in a resistor or in a Zener diode. Formally, the evolution of these generators is not random; just very complex. One could say that determinism is hidden behind complexity.

Although their random numbers are likely to pass randomness tests, these generators are difficult to model. This means that it is impossible to verify, while acquiring numbers, that they are operating properly. In addition, it is difficult to ensure that the system is not interacting – even in a subtle way –with its environment, which could alter the quality of its output.

Quantum world: true randomness

Contrary to classical physics, quantum physics is fundamentally random. It is the only theory within the fabric of modern physics that integrates randomness.

This fact was very disturbing to physicists like Einstein who invented quantum physics. However, its intrinsic randomness has been confirmed over and over again by theoretical and experimental research conducted since the first decades of the 20th century. When designing a random number generator, it is a natural choice to take advantage of this intrinsic randomness and to resort to the use of a quantum process as source of randomness.

Formally, quantum random number generators are the only true random number generators. Although this observation may be important in certain cases, quantum random number generators have other advantages.

This intrinsic randomness of quantum physics allows selecting a very simple process as source of randomness. This implies that such a generator is easy to model and its functioning can be monitored in order to confirm that it is operating properly and is actually producing random numbers.

Contrary to the case where classical physics is used as the source of randomness and where determinism is hidden behind complexity, one can say that with quantum physics randomness is revealed by simplicity. Until recently the only quantum random number generator that existed were based on the observation of the radioactive decay of some element. Although they produce numbers of excellent quality, these generators are quite bulky and the use of radioactive materials may cause health concerns. The fact that simple and low cost quantum random number generators did not exist prevented quantum physics to become the dominant source of randomness.

Implementation

Optics is the science of light. From a quantum physics point of view, light consists of elementary “particles” called photons. Photons exhibit in certain situations a random behavior.

One such situation, which is very well suited to the generation of binary random numbers, is the transmission upon a semi-transparent mirror. The fact that a photon incident on such a component be reflected or transmitted is intrinsically random and cannot be influenced by any external parameters.

In this configuration we have a balanced beam splitter with equal transmissivity and reflectivity. so that classical light entering any of the two 2 input ports would be divided into two streams of the same optical power, half going through and half reflecting. If we have a single photon in one input and the vacuum in the second, we cannot divide the power and we have the desired path superposition. Conceptually, the simplest way to produce random numbers from this path division is placing two detectors D0 and D1, one for each output, and generate a bit every time we detect a photon.

Clicks in D0 would produce a 0 bit and clicks in D1 would pro- duce a 1. Optical QRNGs using spatial superpositions usually apply variations on this basic scheme. In fact, in the original QKD application (Rarity et al., 1994) the random number generator was not fully implemented as a separate device controlling the measurement basis in the receiver. Instead, they used a passive implementation where the beam splitter took the input state and sent it with equal probability to one of two measurement setups, one for each possible basis. A complete implementation with a beam splitter and two photomultiplier tubes as detectors was first deployed as a subsystem in the ex- perimental implementation of a Bell test (Weihs et al., 1998) and later developed as a standalone device (Jen- newein et al., 2000) with some modifications. The most important difference is the way the random sequence is created, with a random digital signal as an intermediate step. In the modified model, detections in D1 take a dig- ital signal to a high level and detections in D0 to a low level. The result is a random signal with changes in a time scale of the order of the inverse of the mean pho- ton detection rate. If we sample this signal with a clock with a frequency sufficiently below the photon detection rate, assigning a binary 0 when the state is low and a 1 for high state, we obtain a constant stream of random bits. The same procedure was tested with polarized pho- tons in a linear 45◦ state and a polarizing beam splitter with essentially the same results.

In (Wang et al., 2006) there is an alternative take on polarization to path con- version with a weak laser source with linear polarization attenuated to the single photon level and a Fresnel prism that separates the positive and negative circular polar- ization components and directs them to two avalanche photodiodes. This kind of polarization generator can be modified to provide adjustable probabilities for each bit value if we include an electronically controlled polarizer at the source, like in the fiber-based QRNG of (Xu et al., 2015) or the decision making system in (Naruse et al., 2015), which adapts the probability to previous results.

QRNGs with optical path branching can show a few problems. All types of photodetectors have some kind of dead time after a click. This can generate anticorrelation of neighbouring bits. A detection at some time makes it less likely to find a photon immediately after due to the “blunted” sensitivity of the detector before full recovery. Also, for real detectors and beam splitters we will find slightly different detection efficiencies and coupling ra- tios that can introduce some bias. There are a few other concerns: afterpulsing can create correlated bits, pulses with multiple photons can produce simultaneous detec- tions and the presence of dark counts means there will be occasional clicks when there are no photons. In practice, these effects, particularly dead time, limit the maximum generation rate to a few Mbps, which could be improved with detectors with a smaller recovery time.
There are many ways to counteract these problems. For instance, the generator in (Jennewein et al., 2000) includes a setup phase in which the tube voltage and the detection threshold of the photodetectors can be adjusted to compensate detection efficiency and path coupling dif- ferences. Another popular method is applying an unbias-ing algorithm that distils a random sequence at the cost of losing some bits.

Technical details can be found in e.g. this article