- Calculate and use: percentage change, medians, rates, ranges, averages, quartiles.
- Use standard deviation to identify outliers.
- Know what correlation and linear regression are.
Looking for patterns
- Data analysis is finding patterns.
- Finding evidence beyond anecdotes.
- And finding anecdotes too.
Newsroom math
- Bad news: Mathematical mistakes harm your credibility.
- Good news: Newsroom math is easy: add, subtract, multiply, and divide are all you need.
1) Percentage change
- Comparing a new number to an old number.
- Formula: (NEW – OLD) / OLD
2) Rates
- Allows you to compare places of different size.
- Formula: EVENTS / POPULATION * ‘Per’ Unit
3) Univariate statistics
Descriptive statistics: taking a single variable in a collection of data and describing the characteristics.
3a) Measures of the Centre:
- Mean (average): total of the values, divided by the number of those values.
- Median: the middle value of an ordered list.
- Mode: the most common value.
- Outliers: atypical values far from the average: this might be where a story is at, where things are different from the average.
Example: Soccer salaries in the US
Normal distribution:
- The peak is in the middle near the mean.
- The wider the curve the greater the standard deviation.
- The curve covers 100%.
3b) Variability: how data can vary from the centre:
Measures of variability:
- Maximum and minimum: largest and smallest values.
- Range: the distance between the maximum and minimum.
- Quartiles: the medians of each half of the ordered list of values.
- Halfway down from the median is the first quartile.
- Halfway up from the median is the third quartile.
- Standard deviation: the average distance from the mean.
3c) Standard deviation
- Helps define whether a value is in fact a true outlier.
- Values are reliably an outlier if found more than 3 StdDev from the mean.
Empirical rule
- 68% of values within 1 StdDev of mean
- 95 of values within 2 StdDev of mean
- 99.7% of values within 3 StdDev of mean
Normal
- Variability is normal
- Values within 3 StdDev are considered normal
4) Multivariate statistics
4a) Correlation
- The relationship between two or more variables in your data.
- Pearson’s r: ranges from -1 to 1
- Positive r: if one variable goes up, the other goes up.
- Negative r: if one variable goes up, the other goes down.
- Correlation does not imply causation.
4b) Linear regression
- Used to predict the dependent variable, based on the value of the independent variable.
I’ve learned
- Calculate and use: percentage change, medians, rates, ranges, averages, quartiles.
- Use standard deviation to identify outliers.
- Know what correlation and linear regression are.