Which measure of central tendency is most at risk of being skewed by outliers?

Measures of central tendency sound like some super fancy complicated statistical term. But in reality, it is as simple as a statistical test that tries to measure the average of a dataset.

Inhaltsverzeichnis Show

Measures of Central Tendency: Psychology
Measures of Central Tendency in Statistics
Measures of Central Tendency: Formula
Measures of Central Tendency: Examples
Measures of Central Tendency: Advantages and Disadvantages
Measures of Central Tendency - Key takeaways
Which measure of central tendency can be skewed by outliers?
What central tendency is best for skewed data?
Which central tendency metric should be used to avoid skew from outliers?
Which measure of central tendency is more affected by outliers?

We will start by looking at the use of measures of central tendency in psychology.
Then we will explore the various forms of measures of central tendency in statistics.
After this, the measures of tendency formulas and measures of tendency examples will be reviewed.
Finally, we will discuss the measures of central tendency advantages and disadvantages.

Measures of Central Tendency: Psychology

Various measures of central tendency in psychology are used in descriptive statistics.

Central tendency is commonly known as the ‘average’. In more technical terms, it is the data set’s most central or representative number.

So why are researchers interested in the measures of central tendency?

When researchers collect data, they have individual data points. But from this, we can get little information. However, the sum of these data points provides useful information. For instance, we can compare experimental groups or identify potential trends.

Measures of Central Tendency in Statistics

In descriptive statistics, there are three ways to measure central tendency the mean, median, and mode.

Researchers don’t simply pick and choose which of the three they will use. Typically the mean is used as it is considered the best measure as the summative figure considers all values on a dataset. However, the others do not to the same extent.

When we collect data that has a non-normal distribution, it isn’t easy to use the mean, so the median or mode is used instead.

Distribution refers to how spread the data is from the average. Non-normal data is apparent when a data set has extreme outliers, or a study recruits a small sample.

Ideally, researchers want data to be normal, but this isn’t always easy. Let’s take a look at the different measures of central tendency formulas.

Measures of Central Tendency: Formula

The mean, in simple terms, is ‘average’. It is what you get if you add up all the values in a data set and then divide by the total number of values.

A data set has the values 2, 4, 6, 8, and 10. The mean would be (2+4+6+8+10) ÷ 5 = 6.

The median is the data set’s central number when ordered from lowest to highest.

Out of the numbers 2, 3, 6, 74 + 69 + 68 + 72 + 70 + 84 + 65 = 74 + 69 + 68 + 72 + 70 + 84 + 65 = 74 + 69 + 68 + 72 + 70 + 84 + 65 = 11, 14, the median is 6.

It’s always easier to calculate when there is an odd number, but sometimes there is an even number of data points. If a data set has an even number of values, the median is between the two central values.

Out of the numbers 2, 3, 6, 11, 14, and 61, the median is between 6 and 11. We calculate the mean of these two numbers, (6+11) ÷ 2, which is 8.5; thus, the median of this data set is 8.5.

The mode is a measure of central tendency of the data value that has the highest frequency.

For a data set of 3, 4, 5, 6, 6, 6, 7, 8, 8, the mode is 6.

It is normally used for nominal data (named data that can be separated into categories such as gender, ethnicity, eye colour, and hair colour). However, the mode can be used for any level of data. E.g. for eye colour, we have the categories’ brown’, ‘blue’, ‘green’, and ‘grey’. The mode can measure which category has the highest eye colour count.

Measures of Central Tendency: Examples

The table below is an example data set. Let’s use the measures of central tendency formula learned earlier to calculate the three types of averages.

Participants’ Memory Score Before Experiment (%)	Participants’ Memory Score After Experiment (%)
76	74
54	69
68	68
59	72
65	70
76	84
63	65

The research aims to determine if people performed and, after the experiment, which measure of central tendency formula would be the best to use? If you’ve guessed the mean, then you’d be correct.

The mean score before the experiment would be calculated as 76 + 54 + 68 + 59 + 65 + 76 + 63 = 461 and then divide this by 7 = 65.86 (2 d.p).

And the mean score after the experiment would be calculated as 74 + 69 + 68 + 72 + 70 + 84 + 65 = 502 and then divided by 7 = 71.71 (2 d.p).

From the average, we can assume the trend that participant’s memory scores are higher after the experiment than before.

However, it’s important to note that we can’t make inferences from the measures of central tendency. Researchers should use inferential statistics for this.

Inferences are when we use statistics to identify if findings can be generalised to the target population.

Only inferential statistics and not descriptive statistics can be used to make inferences. The average, i.e. the measures of central tendency, is supposed to identify patterns and trends and summarise datasets.

Measures of Central Tendency: Advantages and Disadvantages

The mean is a powerful statistic used in population parameters.

Population parameter: When we conduct psychological studies, we use a limited number of participants as it would be impossible to test a whole population.

The measures from these participants are measures of a sample (sample statistics), and we use these sample statistics as an estimate and reflection of the general population (population parameter).

These population parameters we derive from the mean can be used in inferential statistics.

The mean is the most sensitive and precise of the three measures of central tendency. This is because it is used on interval data (data measured in fixed units with equal distances between each point on the scale. E.g., the temperature measured in degrees, IQ test). The mean considers the exact distances between values in a data set.

The disadvantage of the mean is that as the mean is so sensitive, it can easily be distorted by unrepresentative values (outliers).

A sports coach measures how long it takes for pupils to swim 100m. There are ten pupils; all take around 2 minutes except for one, which takes 5 minutes. Due to this outlier of 5 minutes, the value will be higher, so the mean is not entirely representative of the group.

Additionally, as the mean is very precise, sometimes the values calculated do not make sense.

A headteacher would like to calculate the average number of siblings children have at their school. After getting data on all sibling numbers and dividing by the number of pupils, it turns out the mean number of siblings is 2.4.

The advantages of the median are that it is unaffected by extreme outliers and is easier to calculate than, say, the mean.

However, the disadvantage of the measure of central tendency is that it doesn’t account for the exact distances between values like the mean does. Furthermore, it can’t be used to make estimations concerning population parameters.

The mode’s advantages are that it can be used to show and highlight which category has the most occurrences in a category. Similar to the median, it is unaffected by extreme outliers.

There are quite a few disadvantages when it comes to mode, and some of these are:

The mode does not take into account the exact distances between values.
The mode cannot be used in estimates of population parameters.
Not useful for small data sets which have values that occur equally frequently. E.g., 5, 6, 7, 8.
Not useful for categories with grouped data, e.g., 1-4, 5-7, 8-10.

Measures of Central Tendency - Key takeaways

The three measures of central tendency in statistics are the mean, median and mode.
The measures of central tendency in psychology summarise and occasionally allow researchers to make comparisons of datasets.
The measure of central tendency for each are:
- The mean is the sum of all the figures divided by how many numbers are in the dataset.
- The median is the middle value of a dataset when ranked from smallest to largest.
- The mode is the most frequent number in a dataset.
The measures of central tendency advantages and disadvantages differ; generally, the mean is believed to be the most accurate measure.

Which measure of central tendency can be skewed by outliers?

Mean is the only measure of central tendency that is always affected by an outlier. Mean, the average, is the most popular measure of central tendency. Calculator error when finding the mean: Students often forget to use parenthesis when finding the mean of a data set.

What central tendency is best for skewed data?

The median is less affected by outliers and skewed data than the mean, and is usually the preferred measure of central tendency when the distribution is not symmetrical.

Which central tendency metric should be used to avoid skew from outliers?

The median is the most informative measure of central tendency for skewed distributions or distributions with outliers. For example, the median is often used as a measure of central tendency for income distributions, which are generally highly skewed.