Skip to content

Normal Distribution

1. Introduction to Probability Distributions

1.1. What is a Probability Distribution?

A probability distribution describes how probabilities are distributed over the values of a random variable. It specifies the likelihood of different outcomes in an experiment or observation.

1.2. Types of Probability Distributions

  • Discrete Distributions: For countable outcomes (e.g., binomial, Poisson)
  • Continuous Distributions: For measurable outcomes (e.g., normal, exponential)

2. The Normal Distribution

2.1. Definition and Properties

The normal distribution, also known as the Gaussian distribution, is a continuous probability distribution characterized by its bell-shaped curve.

Key Properties: - Symmetrical about the mean - Mean = Median = Mode - Defined by two parameters: mean (\(\mu\)) and standard deviation (\(\sigma\)) - Total area under the curve equals 1 - Follows the Empirical Rule (68-95-99.7 rule)

2.2. Probability Density Function

The probability density function (PDF) of the normal distribution is:

\[ f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} \]

Where:

  • \(\mu\) = mean
  • \(\sigma\) = standard deviation
  • \(\pi\) ≈ 3.14159
  • \(e\) ≈ 2.71828

2.3. Empirical Rule (68-95-99.7 Rule)

For normally distributed data:

  • Approximately 68% of data falls within \(\pm1\) standard deviation from the mean
  • Approximately 95% of data falls within \(\pm2\) standard deviations from the mean
  • Approximately 99.7% of data falls within \(\pm3\) standard deviations from the mean

3. Distribution Shape Characteristics

3.1. Skewness

Skewness measures the asymmetry of a probability distribution around its mean. It indicates whether data are concentrated more on one side of the distribution.

Types of Skewness:

  • Positive Skew (Right Skew): Tail extends to the right, mean > median > mode
  • Negative Skew (Left Skew): Tail extends to the left, mean < median < mode
  • Zero Skew: Symmetrical distribution, mean = median = mode

Calculation: See [[Descriptive Statistics]]

3.2. Kurtosis

Kurtosis measures the "tailedness" of a probability distribution, indicating how much data are in the tails compared to a normal distribution.

Types of Kurtosis:

  • Mesokurtic: Normal distribution, kurtosis = 3 (excess kurtosis = 0)
  • Leptokurtic: Heavy tails and sharp peak, kurtosis > 3 (excess kurtosis > 0)
  • Platykurtic: Light tails and flat peak, kurtosis < 3 (excess kurtosis < 0)

Calculation: See [[Descriptive Statistics]]

4. Standard Normal Distribution (Z-Distribution)

4.1. Definition

The standard normal distribution is a special case of the normal distribution with:

  • Mean (\(\mu\)) = 0
  • Standard deviation (\(\sigma\)) = 1

4.2. Z-Scores

A z-score (standard score) measures how many standard deviations an observation is from the mean:

\[ z = \frac{x - \mu}{\sigma} \]

Interpretation:

  • \(z = 0\): Value equals the mean
  • \(z > 0\): Value above the mean
  • \(z < 0\): Value below the mean

4.3. Z-Table and Probability Calculations

Z-tables provide the cumulative probability from \(-\infty\) to a given z-value. Common z-values and their probabilities:

Z-Score Cumulative Probability
-3.0 0.0013
-2.0 0.0228
-1.0 0.1587
0.0 0.5000
1.0 0.8413
2.0 0.9772
3.0 0.9987

5. Student's t-Distribution

5.1. Definition and Purpose

The t-distribution is used when: - Sample sizes are small (\(n < 30\)) - Population standard deviation is unknown - We need to estimate population parameters from sample data

5.2. Properties

  • Similar bell shape to normal distribution
  • Heavier tails than normal distribution (more probability in extremes)
  • Approaches normal distribution as degrees of freedom increase
  • Defined by degrees of freedom (\(df = n - 1\))

5.3. Degrees of Freedom

Degrees of freedom represent the number of independent pieces of information available to estimate a parameter:

\[ df = n - 1 \]

Where \(n\) is the sample size.

5.4. T-Scores

T-scores are calculated similarly to z-scores but use sample standard deviation:

\[ t = \frac{\bar{x} - \mu}{s/\sqrt{n}} \]

Where:

  • \(\bar{x}\) = sample mean
  • \(\mu\) = population mean (hypothesized)
  • \(s\) = sample standard deviation
  • \(n\) = sample size

6. Comparing Z and T Distributions

Characteristic Z-Distribution T-Distribution
When to Use \(\sigma\) known, large \(n\) \(\sigma\) unknown, small \(n\)
Parameters \(\mu\), \(\sigma\) \(\mu\), \(s\), \(df\)
Shape Fixed bell curve Varies with \(df\)
Tails Lighter Heavier
Applications Hypothesis testing, confidence intervals Same, but for small samples

7. Other Important Distributions

7.1. Bimodal Distribution

  • Has two distinct peaks or modes
  • Often indicates two different populations or processes
  • Common in mixed data sets

7.2. Uniform Distribution

  • All outcomes equally likely
  • Rectangular shape
  • Constant probability density function

7.3. Other Common Distributions

  • Binomial: For binary outcomes
  • Poisson: For count data
  • Exponential: For time between events

8. Applications in Psychological Research

8.1. Hypothesis Testing

  • Using z-tests for large samples with known population parameters
  • Using t-tests for small samples or unknown population parameters

8.2. Confidence Intervals

  • Constructing intervals for population means
  • Determining margin of error

8.3. Effect Size Calculations

  • Standardizing measures for comparison across studies
  • Cohen's d and other effect size metrics

9. Practical Examples

9.1. Example 1: Z-Score Calculation

Given: \(\mu = 100\), \(\sigma = 15\), \(x = 130\)

\[ z = \frac{130 - 100}{15} = 2.0 \]

Interpretation: This score is 2 standard deviations above the mean.

9.2. Example 2: T-Score Calculation

Given: \(\mu = 50\), \(\bar{x} = 55\), \(s = 8\), \(n = 25\)

\[ t = \frac{55 - 50}{8/\sqrt{25}} = \frac{5}{1.6} = 3.125 \]

\(df = 25 - 1 = 24\)

10. R Implementation

10.1. Normal Distribution Functions

# Probability density
dnorm(x, mean = 0, sd = 1)

# Cumulative probability
pnorm(q, mean = 0, sd = 1)

# Quantile function
qnorm(p, mean = 0, sd = 1)

# Random generation
rnorm(n, mean = 0, sd = 1)

10.2. T-Distribution Functions

# Probability density
dt(x, df)

# Cumulative probability
pt(q, df)

# Quantile function
qt(p, df)

# Random generation
rt(n, df)

10.3. Sample Standard Deviation

sample_sd <- sd(data) # Sample standard deviation

11. Summary

  • The normal distribution is fundamental in statistics with predictable properties
  • Z-distribution is used when population parameters are known
  • T-distribution is used for small samples with unknown population parameters
  • Understanding distribution shapes (skewness, kurtosis) helps interpret data patterns
  • These distributions form the basis for many statistical tests in psychological research