What is mathematical chance

Coincidence: mathematics

Back to Overview

Chance in math

Introduction [edit]

The Stochastics is the sub-discipline of mathematics that deals with chance. The statistics are a bit bold earlier than the physics of mathematics and often assessed as supposedly inaccurate. This attitude has now largely changed. For purely descriptive statistics, the evaluative Statistics and probability theory were added. Statistics and probability theory were then combined under the new term stochastics.

Mathematics presupposes an idea of ​​chance and provides computational models with which random events can be quantified. So far, mathematics does not want to provide a fundamental clarification of what coincidence actually is. Nevertheless, experimenting and calculating with chance was very fruitful. Without getting entangled in fundamental discussions, important insights into chance were gained that philosophy could not provide.

Stochastics is the generic term for statistics and probability theory

Random experiment

In the Probability theory is an experiment that, if repeated under the same conditions, does not necessarily produce the same result, a random experiment. An experiment is understood here as a process that has an unpredictable, detectable result, for example the tossing of a coin or a dice.

Although the result of each individual experiment is random, regularities can be recognized if the experiment is repeated sufficiently frequently. The variables of interest in a random experiment are called random variables.

Example of a random experiment [edit]

The stages of a random experiment are as follows:

  1. Before the experiment: At least 2 results are possible, but nothing has been decided yet.
  2. The random experiment is carried out.
  3. One of the at least 2 possible outcomes was chosen at random.

The simplest random experiment has two possible outcomes that have the same probability.

You can carry out this kind of random experiment with a coin and generate random numbers yourself. You assign the number 0 to one side of the coin and the number 1 to the other. By noting many throw results you get a sequence of 0 and 1. Such a sequence is the result of a very simple random process.

The random sequences of 0 and 1 obtained in this way can easily be examined statistically. In doing so, one can determine properties of these random sequences which do not occur in the case of non-random sequences (i.e. sequences that were generated deterministically according to some law). In this way you can check sequences of numbers for true randomness.

Conspicuous statistical deviations from pure random sequences can be used, for example, to expose scientific falsifications, since measurements always contain a random measurement error, while invented random errors often contain significant deviations from the random result precisely because of the attempt to make them appear as random as possible.

The longer a sequence of numbers, the clearer it can be differentiated whether it is a random or non-random sequence. Theoretically, a random experiment can also deliver a sequence of one hundred zeros in a row, but it is so unlikely that in this case one can rightly assume a regularity. On the other hand, there are deterministic algorithms, the results of which are very similar to those of a random experiment, so-called pseudo-random generators. With good pseudo-random generators you need a very long series of numbers in order to be able to recognize the difference to real randomness. Random numbers are occasionally required in computer science. Trying to do it with the computer too to calculate, is possible, but only really convincing for shorter sequences. With longer sequences, the non-randomness comes to light at some point.

A 01 sequence that depicts reality is not always purely deterministic or purely random, but there is often a mixture of both. A simple example would be if, for example, one always determined one digit by flipping a coin, the next as the difference between the two preceding digits, then flipping the coin again, and so on. By examining such consequences, one gets a fairly good understanding of chance and the mixture of the accidental and the non-accidental, as is often found in reality.

An elementary random event is based on equality and inequality

  • The two possible variants must be the same (that is, equally likely).
  • Nevertheless, they must somehow be unequal, namely be distinguishable.

(Coin: both sides must be able to appear with the same probability, but both sides must be minted (or colored, etc.) differently, otherwise it would not be possible to distinguish between them.)

Basic terms in mathematical chance

In the case of a random event that is to be considered and evaluated mathematically, three terms have to be separated:

  • Number of possibilities
    • which and how many alternative results are there
    • this is also called the event space of a random event
  • probability
    • how likely is each outcome
  • Amount of random information = entropy in mathematics
    • How much coincidence d. H. Entropy is in the event

The best way to understand this is to look at a simple example:

The following applies to the ideal one-time coin toss:

  • Number of possibilities: 2
  • Probability of each possibility 0.5
  • Entropy = Log2 (2) = 1 bit

If you repeat the coin toss twice, then the following applies:

  • Number of possibilities: 4
  • Probability of each possibility 0.25
  • Entropy = Log2 (4) = 2 bit

The entropy is more related to the term Number of possibilities than with the probability, because it is derived from the number of possibilities and only made smaller and more manageable by taking the logarithm.

Random quantitative [edit]

In the formal world of mathematics, abstract structures can be defined that are motivated by human imagination or the expectation of chance. Games of chance motivated the first mathematical theories of probability and are still often used to illustrate them today.

The following terms are central to the formal description of chance:

(Random) experiment: The operations performed and / or observed (for example, throwing a dice twice).
Result or elementary event: observation (for example first throw '3', second throw '5').
event: Amount composed of elementary events (the event "even number diced" is composed of the elementary events "2,4 or 6 diced").
probability: Each natural event is a numerical value between 0 (never occurs) and 1 (always occurs) assigned (e.g. equal distribution: the probability for each number on the cube is the same, namely 1/6). In the case of a continuum of possible results, one speaks of a probability distribution.

Obviously, only those random experiments are interesting that have more than one possible result.

Statistics tries to determine the underlying probability distribution for a given random experiment.

Random variable, random variable [edit]

As Random variable or Random size is a mathematical variable that can take on different values ​​depending on the result of a procedure that is considered to be random. The procedure can be a draw, the calculation of a pseudo-random number or the measurement of a statistically distributed and / or subject to measurement errors.

For example, the Number of pips on a dice a random variable that can take the values ​​1, 2, 3, 4, 5 or 6.

If the set of possible values ​​of a random variable is finite (as with the cube) or countably infinite, it is called the random variable discreet. If the set of values ​​is uncountable, i.e. typically consists of real numbers (as in the idealized measurement of a physical quantity), it is called the random variable steadily (continuously).

The probabilities of the possible values ​​of a discrete random variable form a probability distribution.

The probability of rolling two dice, the total number of eyes Z to achieve follows, for example, the probability distribution

.

The possible value of a continuous random variable, on the other hand, cannot be assigned a finite probability; one has to work with probability densities and / or cumulative probability distributions.

In a more abstract formulation of probability theory, a random variable is a measurable function from a probability space into a measure space. Normally, the set of | real numbers is chosen as the image space, equipped with Borel's σ-algebra.

Random numbers [edit]

In statistics one describes with Random number the value of a random variable in a random experiment. The result of the experiment is independent of previous results.

Random numbers are required in various statistical methods, e.g. B. in the selection of a representative sample from a population, in the distribution of test animals to different test groups (randomization), in the Monte Carlo simulation and others.

There are various methods for generating random numbers. Real random numbers are generated with the help of physical phenomena: coin toss, dice, roulette, noise (physics) of electronic components, radioactive decay processes. However, these procedures are quite time consuming. In real applications, a sequence of pseudo-random numbers is usually sufficient, i. H. seemingly random numbers that are generated according to a fixed, reproducible process. So they are not really random, but have similar statistical properties (even frequency distribution, low correlation) as real random number sequences.

See also http://de.wikipedia.org/wiki/Vertteilung_von_Zufallszahlen

Pseudo random numbers [edit]

The monkey at the typewriter, a famous random number generator

As Pseudo random numbers denotes sequences of numbers that are calculated by a deterministic algorithm (pseudo-random number generator) (and thus Not are random), but look random (for sufficiently short sequences). Each time the random number calculation is started with the same starting value, the same sequence of numbers is generated (which is why these numbers are far from being really random). Pseudo-random numbers are generated with pseudo-random number generators.

The randomness is determined by statistical properties of the sequence of numbers, such as the equal probability of the individual numbers and statistical independence of different numbers in the sequence. The quality of a pseudo-random number generator determines how well these statistical requirements are met.

A sequence of pseudo-random numbers is calculated using deterministic algorithms on the basis of a truly randomly selected starting value. Such a start value can e.g. B. be the system time of the computer in milliseconds at the moment it was last switched on. This sequence has the property that it is difficult to predict the next numbers in the sequence from just a few numbers. A sequence of pseudo-random numbers "looks random".

Properties of pseudo-random number algorithms [edit]

Some random number algorithms are periodic. Even if it would usually be better to use non-periodic algorithms, the periodic ones are often much faster. The period can be made as large as desired by a clever choice of parameters, which is why they are often clearly superior to the non-periodic ones in practice. Some pseudorandom number generators are also only finite; H. you cannot produce as many numbers as you want with them (hence they are in a certain sense related to the periodic ones).

Three examples of pseudorandom number generators [edit]

finite generator

To a sequence of Numbers between and to generate, choose one greater than , a greater than and not divisible by small prime numbers (where small here means: less than ).

periodic generator [edit]

Take starting numbers , , and , in which the largest of these numbers is.

What is ? This is the number a that you put into the formula to find the next value for a. You can also write the formula like this: and then starts with to be counted on.

Another example is the Mersenne Twister.

non-periodic / infinite generator [edit]

Take the decimal places of a root of an integer as random numbers

Use of pseudo-random numbers [edit]

Pseudorandom numbers are among others. used in computer simulation, in which statistical processes are simulated with the help of software. Pseudo-random numbers can also be useful for debugging computer programs. On the other hand, this property makes pseudo random numbers unusable for certain applications (for example, in cryptography you have to be careful not to use pseudo random numbers in the wrong places). Pseudo-random numbers are also used in noise generators.


Another advantage of pseudo-random numbers is that they can be generated on any computer without recourse to external data (which makes them interesting again for certain areas of cryptography despite the disadvantages mentioned above). To generate true random numbers, one needs either a real random number generator (e.g. by digitizing noise or by taking advantage of quantum effects) or at least a source of quasi-random (usually unpredictable) events such as times of user input or network activity.

Links to random number generators [edit]

The law of large numbers

The law of large numbers states that the relative frequency of a random result gets closer and closer to a mean value of the probability for this result, the more frequently the random experiment is carried out.

Law of large numbers as an animated gif file

Example: tossing a coin

Number of litters of which head ratio head / total relative distance theoretically observed theoretically observed 100 50 48 0.5 0.48 0.02 1000 500 491 0.5 0.491 0.009 10000 5000 4970 0.5 0.497 0.0030

The observed head / total ratio (0.48 -> 0.491 -> 0.497) approaches the theoretical ratio of 0.5 as the number of litters increases.

The Gaussian distribution

The normal distribution [edit]

When measuring a physical quantity, there is often a characteristic bell-shaped distribution of the individual measured values ​​around the mean value. To obtain this bell curve, one and the same measurement is repeated very often and the values ​​obtained are arranged in columns of equal width according to their frequency.

If one then obtains such a bell curve, one can assume that only chance is responsible for the deviation from the mean value during the measurement and that the measurement results obtained are normally distributed.

For the calculation of probability, the Gaussian bell curve is one of the most important distributions of random variables.

In order to understand the normal distribution, you should experiment a little with it or program it yourself in a simple programming language with the possibility of a graphical representation. Since the Greek letters in the formulas may put you off, they are replaced by letters from the familiar Latin alphabet.

Links [edit]

Random sequence - random sequence [edit]

A random sequence or sequence is created through the repeated application of a statistical experiment. A random sequence is generally a sequence of realizations of a random variable. The term is mostly used in the sense of a sequence of characters randomly selected from a certain alphabet or number of numbers.

The simplest random sequence is obtained by repeatedly tossing a coin, if one assigns the 0 to one side of the coin and the 1 to the other. You can recode other random sequences into a simple 0-1 sequence (dichotomize) without losing the random character.

Example: 1011011010101001110010110011100000011110010100001111010100010011011110110000100010 1010001110111001010111011111110000010011010000110111011110101011000001000111011000 1000000100111110000011111010010001101111001010100000101101000011000110100011001111 0111110001101110010011000000111110010000001100001000000110101010000011000101100001 1100111100100001101111111100100101010011111001000100100001001001000010001010011100 1111011000001010011111110010111110111011000111011010110000011101100111101011001110

This sequence was won by repeatedly tossing a coin.It is noticeable how often longer connected sequences of 0 or 1 can be found.

Alonzo Church called a sequence of numbers random if it cannot be calculated or if the shortest computer program that can be written to calculate the sequence is longer than the sequence itself.

A random sequence is characterized by a vanishing serial correlation or autocorrelation, i. H. the correlation coefficient between successive values ​​in the sequence is not significantly different from zero.

Many naturally occurring signals that are discrete in time or place (e.g. DNA, see also DNA sequence analysis) are statistically analyzed by first postulating the null hypothesis of an underlying random process. If this hypothesis can be refuted, i.e. if there are correlations in the sequence, these may indicate useful information hidden in the sequence. Especially in the case of dense sequences, the sequences can be checked for randomness with the help of the run test, with "run" denoting a sequence of identical characteristics in the sequence. The test leads to rejection if there are too few, but also too many runs in a sequence.

Run test for randomness of a sequence [edit]

see also Run test

The run- or Runs test (also Wald-Wolfowitz test or Iteration test) is a nonparametric test for the randomness of a sequence. Conceptually, a two-part population is assumed, for example an urn model with two types of balls. N many balls have been removed with replacement. The aim is to test the hypothesis that the removal occurred by chance.

Procedure [edit]

N spheres were taken from a dichotomous population. The results are available in their chronological order. All neighboring results of the same value are now combined into one run or run. As a German word for the term run, it is best to use the term series or a series of the same draws. If the sequence is actually random, there shouldn't be too few runs, but not too many either.

Example: Ball color white displayed as a zero and ball color black displayed as a one. You draw the following order: 0011101110000010100110111010011000011100110010111001110111001010011110010101011000001111000000110111111001011011 00101101100111111001101110010111000111100010010111000001101110001001010101101110110111000

The first run consists of two white balls (00), then a series of 3 black balls (111), then a single white ball (0) etc.

The hypothesis is made: The removal was done randomly.

To determine the number of runs for which the hypothesis is rejected, the distribution of the runs is required: Let n1 the number of balls of the first kind and n2 = n - n1 the second variety; let r be the number of runs. According to the principle of symmetry, the probability of any sequence of balls being taken at random is the same. There are altogether

Possibilities of withdrawal.

With regard to the distribution of the number of runs, a distinction is made between the following cases:

  1. The number of runs r is an even number:
    There are Runs of the balls of the first sort and Runs of the balls of the second sort before. The probability that exactly r runs have occurred is then
  2. The number of runs r is odd:
    There are Runs of the balls of the first sort and Runs of the balls of the second kind in front or the opposite case. The probability that exactly r runs have occurred is then calculated as the sum of these two possibilities

If r is too small or too large, this leads to the rejection of the null hypothesis. At a significance level of alpha, H becomes H0 rejected if the following applies to test variable r:

or

with r (p) as the quantile of the distribution of R at the point p, whereby the principle of conservative testing is applied here. Since the calculation of the critical values ​​of r for the rejection of the hypothesis is cumbersome, a table is often used.

What does the spelling mean ?

Simple example [edit]

For a panel discussion with two political parties, the order of the speakers was supposedly determined by chance. There were 4 representatives from the Supi party and 5 from the Toll party. The order of the speakers was given as follows:

S S T S T T T S T

A representative from Toll complained that S was preferred. A run test was carried out:

It is n1 = 4 and n2 = 5. r = 6 runs were obtained.

According to the table of the run test, H0 rejected if r ≤ 2 or r ≥ 9. So the test variable r = 6 lies in the non-rejection area; one can assume that the order of the speakers is random.

Additions [edit]

Parameters of the distribution of R

The expectation of R is

and the variance

.

Population with more than two expressions of the characteristic [edit]

If there is a sequence of real numbers xi of a metric characteristic, the sequence is dichotomized: The median z of the sample is determined. Values ​​x z as 2nd sort of balls. This dichotomous sequence can then be tested again for randomness.

If there is a non-numerical symbol sequence with more than two occurrences, a numerical series must first be generated, whereby the problem here may be that the symbols cannot be sorted.

Normal approximation [edit]

For sample sizes n1, n2 > 20 the number of runs R is approximately normally distributed with expected value and variance as above. The standardized test variable is obtained

The hypothesis is rejected if

or

With as a quantile of the standard normal distribution for the probability