본문 바로가기

카테고리 없음

Newsroom math and statistics

  • Calculate and use: percentage change, medians, rates, ranges, averages, quartiles.
  • Use standard deviation to identify outliers.
  • Know what correlation and linear regression are.

Looking for patterns 

  • Data analysis is finding patterns.
  • Finding evidence beyond anecdotes.
  • And finding anecdotes too.

 Newsroom math 

  • Bad news: Mathematical mistakes harm your credibility.
  • Good news: Newsroom math is easy: add, subtract, multiply, and divide are all you need. 

 

1) Percentage change

  • Comparing a new number to an old number.
  • Formula:  (NEW – OLD) / OLD

2) Rates 

  • Allows you to compare places of different size.
  • Formula: EVENTS / POPULATION * ‘Per’ Unit

3) Univariate statistics 

Descriptive statistics: taking a single variable in a collection of data and describing the characteristics.

3a) Measures of the Centre:

  • Mean (average): total of the values, divided by the number of those values.
  • Median: the middle value of an ordered list.
  • Mode: the most common value.
  • Outliers: atypical values far from the average: this might be where a story is at, where things are different from the average.

Example: Soccer salaries in the US  

Normal distribution: 

  • The peak is in the middle near the mean.
  • The wider the curve the greater the standard deviation.
  • The curve covers 100%.

3b) Variability: how data can vary from the centre: 

Measures of variability:

  • Maximum and minimum: largest and smallest values.
  • Range: the distance between the maximum and minimum.
  • Quartiles: the medians of each half of the ordered list of values.
    • Halfway down from the median is the first quartile.
    • Halfway up from the median is the third quartile.
  • Standard deviation: the average distance from the mean.

3c) Standard deviation 

  • Helps define whether a value is in fact a true outlier.
  • Values are reliably an outlier if found more than 3 StdDev from the mean.

Empirical rule 

  • 68% of values within 1 StdDev of mean
  • 95 of values within 2 StdDev of mean
  • 99.7% of values within 3 StdDev of mean

Normal 

  • Variability is normal
  • Values within 3 StdDev are considered normal

4) Multivariate statistics 

4a) Correlation

  • The relationship between two or more variables in your data.
  • Pearson’s r: ranges from -1 to 1
    • Positive r: if one variable goes up, the other goes up.
    • Negative r: if one variable goes up, the other goes down.
  • Correlation does not imply causation.

4b) Linear regression

  • Used to predict the dependent variable, based on the value of the independent variable.

 

I’ve learned 

  • Calculate and use: percentage change, medians, rates, ranges, averages, quartiles.
  • Use standard deviation to identify outliers.
  • Know what correlation and linear regression are.