What does a negative kurtosis mean

Descriptive statistics and probability theory

3.6 Dimensions of symmetry and curvature

Explanatory video for section 3.6 (slides 88-96)

After a short detour to graphical representations, we now turn to two further measures that are used to characterize cardinal-scaled features or their original lists. So far we have got to know how key figures for the location and the spread of the characteristic values ​​can be formed, now come key figures for the assessment of the symmetry (or rather asymmetry) and the so-called Bulge of original lists.

Symmetry and skew

First of all, let's think about what symmetry to mean an original list or a feature. In common parlance, an object is referred to as symmetricalif the corresponding mirror image cannot be distinguished from the object itself. An analogous definition offers itself with regard to the already known possibility of displaying an original list with the help of a bar diagram for characteristics. A feature is called accordingly symmetrical, if its associated bar diagram is symmetrical in the sense of axis symmetry around a parallel to the \ (y \) axis.

It can be shown that in the case of such a symmetry the axis of symmetry must lie precisely at the point \ (\ overline {x} \). Accordingly, a feature is symmetric (by \ (\ overline {x} \)) if and only if (apart from the sequence) the according to \ (2 \ cdot \ overline {x} -x_i \) for \ (i \ in \ {1, \ ldots, n \} \) does not differ from the original list (or, equivalently, the frequency distribution of the feature \ (X \) does not differ from the frequency distribution of the "transformed" feature \ (2 \ cdot \ overline as described) {x} -X \) differs). By simple substitution, one also obtains the equivalent characterization via the agreement (except for the sequence) of the original lists from \ (x_i- \ overline {x} \) and \ (\ overline {x} -x_i \) (or the correspondingly formed features \ (X- \ overline {x} \) and \ (\ overline {x} -X \)).

If a feature does not have the defined property of symmetry, one is interested in a measure of the degree of asymmetry. For this purpose, the so-called empirical skewness respectively Skewness used in accordance with Definition 3.7 below.

Definition 3.7 (empirical skewness)

Let \ (X \) be a feature with the original list \ (x_1, \ ldots, x_n \). Then \ [\ text {skewness} (X): = \ frac {1} {n} \ sum_ {i = 1} ^ n \ left (\ frac {x_i- \ overline {x}} {s} \ right ) ^ 3 \] with \ (\ overline {x} = \ frac {1} {n} \ sum_ {i = 1} ^ n x_i \) and \ (s = \ sqrt {\ frac {1} {n} \ sum_ {i = 1} ^ n (x_i- \ overline {x}) ^ 2} \) die empirical skewness from \ (X \).

It is easy to see that this key figure measures skewness at least insofar as it always assumes the value \ (0 \) for symmetrical features, because (apart from the order) \ (x_i- \ overline {x} \) and \ ( \ overline {x} -x_i \) then obviously \ (\ text {skewness} (X) = - \ text {skewness} (X) \), which only applies to \ (\ text {skewness} (X) = 0 \) can be the case.

It can also be seen that the skewness measure defined in this way (unlike the empirical variance) can assume both positive and negative values; Strength, but also the direction to quantify the crookedness. The following figure compares bar diagrams of a symmetrical feature, a feature with positive skewness, and a feature with negative skewness.

Figure 3.14: Example: empirical skewness of features

In the present example, the characteristic with positive skewness is characterized by a relatively strong concentration of “small” characteristic values ​​on the left side of the bar diagram combined with a more extensive distribution of the “large” characteristic values ​​on the right side of the bar diagram, while the opposite is true for the characteristic with negative Skew has a more extensive distribution of the “small” original list entries on the left side combined with a high concentration of the “large” original list entries on the right side of the bar diagram.

According to the visual impression of the steeper or flatter rise or fall of the bars in the bar diagram on the left or right side, the following terms have been established:

A feature is called \ (X \)

  • left part or right skew, if \ (\ text {skewness} (X)> 0 \) and

  • right part or left skewed, if \ (\ text {skewness} (X) <0 \)

applies.

Also of interest is the relationship between the mutual position of the arithmetic mean \ (\ overline {x} \) and the median \ (x _ {\ text {med}} \), which is clearly determined according to the known convention, to the asymmetry of a feature: this applies to symmetrical ones Features always\ (\ overline {x} = x _ {\ text {med}} \), while with left-steeped features tends to\ (\ overline {x}> x _ {\ text {med}} \) and with right-hand parts tends to\ (\ overline {x}

Bulge / Kurtosis

The next key figure for characterizing cardinally scaled features is the so-called Bulge or Kurtosis. This key figure is used to determine whether the original list entries tend to spread evenly around the center of the frequency distribution, or whether there are comparatively many original list entries very close to the center combined with a few (possibly only individual) characteristic values ​​with a large distance from the center.

The term Bulge is best explained by looking at the associated bar graphs or histograms; here there is often a clear connection between the key figure Bulge and recognize the shape of the diagram. So have characteristics with a small Curvature more evenly distributed feature values, leading to flatter Peaks in bar charts and histograms, while features with a huge Bulge especially many values ​​near the center and some very distant values, leading to steeper Peaks in bar diagrams and histograms (combined with some “sparse” values ​​or classes at the edge), as can be seen in the following example.

Figure 3.15: Example: Features with different empirical curvature / kurtosis

For the benefit and the interpretation of the above comparison it is important that both features have matching (arithmetic) mean values ​​and variances, and that the scaling of the \ (x \) axis has been chosen so that the (only a closer look at recognizing) sparsely populated classes are just shown at the outer edge. The “proximity” of the original list entries to the center used in the above explanation must therefore always be assessed in relation to the variance or standard deviation of the characteristic. This is also reflected (otherwise similar to the definition of skew) in the calculation rule of the following definition 3.8.

Definition 3.8 (empirical curvature, kurtosis)

Let \ (X \) be a feature with the original list \ (x_1, \ ldots, x_n \). Then \ [\ text {kurtosis} (X): = \ frac {1} {n} \ sum_ {i = 1} ^ n \ left (\ frac {x_i- \ overline {x}} {s} \ right ) ^ 4 \] with \ (\ overline {x} = \ frac {1} {n} \ sum_ {i = 1} ^ n x_i \) and \ (s = \ sqrt {\ frac {1} {n} \ sum_ {i = 1} ^ n (x_i- \ overline {x}) ^ 2} \) die empirical curvature (kurtosis) from \ (X \).

For the quantitative classification of a calculated Bulge respectively Kurtosis First of all, it should be noted that a kurtosis is not only obviously always nonnegative, but it can also be shown that the kurtosis of a feature is always at least \ (1 \) and can otherwise in principle be arbitrarily large. A special value of the kurtosis (why this is the case will become a little clearer in the further course of the event) is the value \ (3 \), which forms the boundary between low and high kurtosis. Taking this limit into account, the following characterization of features on the basis of their curvature or kurtosis are common:

A feature is called \ (X \)

  • platykurtisch or flat-topped, if \ (1 \ le \ text {kurtosis} (X) <3 \) and

  • leptokurtisch or steep peak, if \ (\ text {kurtosis} (X)> 3 \)

applies.

Attention When interpreting a reported kurtosis, it is always necessary because it is also widespread, instead of the (actual) kurtosis the so-called Excess kurtosis or shorter the excess which is calculated as \ (\ text {kurtosis} (X) -3 \), and unfortunately this occasionally only with Kurtosis to designate, which can lead to a considerable likelihood of confusion.

For a better understanding, it is worth taking a closer look retrospectively at the calculation rule not only for the empirical kurtosis, but also for the empirical skewness and variance. A comparison of the corresponding formulas \ [s ^ 2 = \ frac {1} {n} \ sum_ {i = 1} ^ n \ left (x_i- \ overline {x} \ right) ^ 2, \ \ text {skewness} (X) = \ frac {1} {n} \ sum_ {i = 1} ^ n \ left (\ frac {x_i- \ overline {x}} {s} \ right) ^ 3, \ \ text {kurtosis} (X) = \ frac {1} {n} \ sum_ {i = 1} ^ n \ left (\ frac {x_i- \ overline {x}} {s} \ right) ^ 4 \] makes the analogy more visible are: essentially empirical variance, skewness and kurtosis are mean values ​​of the second, third and fourth powers of all deviations of the original list entries from \ (\ overline {x} \), whereby in the formula for the empirical skewness and kurtosis the spread of the original list entries "are calculated out " has been.

It is easy to realize that this “calculation out” is absolutely necessary for the sensible use of these key figures: for example, if you multiply all entries in the original list by the factor \ (2 \), then (without which this would also result in the value \ (2 \) ) increasing the value of the standard deviation \ (s \) or its corresponding power in the denominator of the calculation rules) an “unstandardized” version of the empirical skewness by the factor \ (8 \) and an “unstandardized” version of the empirical kurtosis by the factor \ (16 \) enlarge. This “stretching” of the original list only changes the scaling of the \ (x \) axis in the bar chart or histogram, the shape in terms of skewness or curvature remains completely unchanged and should therefore of course not be accompanied by other values ​​of the associated key figures.

In order to make the empirical skewness and kurtosis of features of different scattering comparable, the standardization of the (already “centered”) entries in the original list (done with the help of division by the corresponding power of \ (s \)) is obviously inevitable. Centering and then subtracting the scatter (which can often be interpreted as subtracting the unit) is also called standardization (also Studentization), on the basis of a feature \ (X \) with the original list entries \ (x_i \), \ (i \ in \ {1, \ ldots, n \} \), the arithmetic mean \ (\ overline {x } \) as well as the empirical standard deviation \ (s \) das standardized feature\ (Y \) with the original list entries \ [y_i = \ frac {x_i- \ overline {x}} {s}, \ qquad i \ in \ {1, \ ldots, n \} \] define which arithmetic mean \ (0 \) and empirical standard deviation (as well as variance) \ (1 \), as one can easily recalculate.

Using this standardized feature \ (Y \), the empirical skewness and kurtosis can then be represented in the following particularly simple form:

  • \ (\ text {skewness} (X) = \ overline {y ^ 3}: = \ sum_ {i = 1} ^ n y_i ^ 3 \)

  • \ (\ text {kurtosis} (X) = \ overline {y ^ 4}: = \ sum_ {i = 1} ^ n y_i ^ 4 \)

Skew and curvature in graphical representations

In this section, we summarize how - even without calculating the corresponding key figures from the original list - based on the widespread graphical representations of features (such as box plots and histograms), at least a tendency towards skewness and curvature can be inferred.

First of all, it can be stated that symmetrical features always produce symmetrical box plots, for example as a box plot for the (by 5) symmetrical original list \ [1,2,3,4,5,6,7,8,9 \] the following representation:

Figure 3.16: Example: box plot of a symmetrical feature

If a feature is asymmetrical, the type of asymmetry or skew can be identified as follows:

  • At to the left Has characteristics tends to the right / upper part (right / top part of the box and right / top whisker) one bigger Extension than the left / lower part.

  • At right-hand Has characteristics tends to the right / upper part (right / top part of the box and right / top whisker) one smaller ones Extension than the left / lower part.

Box plots are particularly suitable for assessing the empirical curvature / kurtosis, since they are characteristic of (leptokurtic) features (with a large kurtosis) Runaway must be entered here separately and are therefore easy to recognize. The following applies:

  • For features with greater empirical kurtosis tends to exist many outliers, i.e. separately entered feature values ​​outside the whiskers (depending on the skew at least on one side).

  • For features with smaller empirical kurtosis is common few or no outliers at all.

How the empirical kurtosis can be assessed on the basis of histograms became a concept when the concept was formed Bulge described, the skewness can also be easily assessed, since one proceeds here essentially as with stick diagrams and one sees the difference between left part and right part can tell whether the Increase in frequency densities on the left or right flank of the "summit" steeper is.

Finally, the following figures compare examples of histograms and (matching) box plots for characteristics of different empirical skewness and curvature.

Figure 3.17: Example: Histograms for different empirical skewness / kurtosis

Figure 3.18: Example: box plots for different empirical skewness / kurtosis