Normal Distribution

1. Introduction to Probability Distributions

1.1. What is a Probability Distribution?

A probability distribution describes how probabilities are distributed over the values of a random variable. It specifies the likelihood of different outcomes in an experiment or observation.

1.2. Types of Probability Distributions

Discrete Distributions: For countable outcomes (e.g., binomial, Poisson)
Continuous Distributions: For measurable outcomes (e.g., normal, exponential)

2. The Normal Distribution

2.1. Definition and Properties

The normal distribution, also known as the Gaussian distribution, is a continuous probability distribution characterized by its bell-shaped curve.

Key Properties: - Symmetrical about the mean - Mean = Median = Mode - Defined by two parameters: mean (\(\mu\)) and standard deviation (\(\sigma\)) - Total area under the curve equals 1 - Follows the Empirical Rule (68-95-99.7 rule)

2.2. Probability Density Function

The probability density function (PDF) of the normal distribution is:

\[ f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} \]

Where:

\(\mu\) = mean
\(\sigma\) = standard deviation
\(\pi\) ≈ 3.14159
\(e\) ≈ 2.71828

2.3. Empirical Rule (68-95-99.7 Rule)

For normally distributed data:

Approximately 68% of data falls within \(\pm1\) standard deviation from the mean
Approximately 95% of data falls within \(\pm2\) standard deviations from the mean
Approximately 99.7% of data falls within \(\pm3\) standard deviations from the mean

3. Distribution Shape Characteristics

3.1. Skewness

Skewness measures the asymmetry of a probability distribution around its mean. It indicates whether data are concentrated more on one side of the distribution.

Types of Skewness:

Positive Skew (Right Skew): Tail extends to the right, mean > median > mode
Negative Skew (Left Skew): Tail extends to the left, mean < median < mode
Zero Skew: Symmetrical distribution, mean = median = mode

Calculation: See [[Descriptive Statistics]]

3.2. Kurtosis

Kurtosis measures the "tailedness" of a probability distribution, indicating how much data are in the tails compared to a normal distribution.

Types of Kurtosis:

Mesokurtic: Normal distribution, kurtosis = 3 (excess kurtosis = 0)
Leptokurtic: Heavy tails and sharp peak, kurtosis > 3 (excess kurtosis > 0)
Platykurtic: Light tails and flat peak, kurtosis < 3 (excess kurtosis < 0)

Calculation: See [[Descriptive Statistics]]

4. Standard Normal Distribution (Z-Distribution)

4.1. Definition

The standard normal distribution is a special case of the normal distribution with:

Mean (\(\mu\)) = 0
Standard deviation (\(\sigma\)) = 1

4.2. Z-Scores

A z-score (standard score) measures how many standard deviations an observation is from the mean:

\[ z = \frac{x - \mu}{\sigma} \]

Interpretation:

\(z = 0\): Value equals the mean
\(z > 0\): Value above the mean
\(z < 0\): Value below the mean

4.3. Z-Table and Probability Calculations

Z-tables provide the cumulative probability from \(-\infty\) to a given z-value. Common z-values and their probabilities:

Z-Score	Cumulative Probability
-3.0	0.0013
-2.0	0.0228
-1.0	0.1587
0.0	0.5000
1.0	0.8413
2.0	0.9772
3.0	0.9987

5. Student's t-Distribution

5.1. Definition and Purpose

The t-distribution is used when: - Sample sizes are small (\(n < 30\)) - Population standard deviation is unknown - We need to estimate population parameters from sample data

5.2. Properties

Similar bell shape to normal distribution
Heavier tails than normal distribution (more probability in extremes)
Approaches normal distribution as degrees of freedom increase
Defined by degrees of freedom (\(df = n - 1\))

5.3. Degrees of Freedom

Degrees of freedom represent the number of independent pieces of information available to estimate a parameter:

\[ df = n - 1 \]

Where \(n\) is the sample size.

5.4. T-Scores

T-scores are calculated similarly to z-scores but use sample standard deviation:

\[ t = \frac{\bar{x} - \mu}{s/\sqrt{n}} \]

Where:

\(\bar{x}\) = sample mean
\(\mu\) = population mean (hypothesized)
\(s\) = sample standard deviation
\(n\) = sample size

6. Comparing Z and T Distributions

Characteristic	Z-Distribution	T-Distribution
When to Use	\(\sigma\) known, large \(n\)	\(\sigma\) unknown, small \(n\)
Parameters	\(\mu\), \(\sigma\)	\(\mu\), \(s\), \(df\)
Shape	Fixed bell curve	Varies with \(df\)
Tails	Lighter	Heavier
Applications	Hypothesis testing, confidence intervals	Same, but for small samples

7. Other Important Distributions

7.1. Bimodal Distribution

Has two distinct peaks or modes
Often indicates two different populations or processes
Common in mixed data sets

7.2. Uniform Distribution

All outcomes equally likely
Rectangular shape
Constant probability density function

7.3. Other Common Distributions

Binomial: For binary outcomes
Poisson: For count data
Exponential: For time between events

8. Applications in Psychological Research

8.1. Hypothesis Testing

Using z-tests for large samples with known population parameters
Using t-tests for small samples or unknown population parameters

8.2. Confidence Intervals

Constructing intervals for population means
Determining margin of error

8.3. Effect Size Calculations

Standardizing measures for comparison across studies
Cohen's d and other effect size metrics

9. Practical Examples

9.1. Example 1: Z-Score Calculation

Given: \(\mu = 100\), \(\sigma = 15\), \(x = 130\)

\[ z = \frac{130 - 100}{15} = 2.0 \]

Interpretation: This score is 2 standard deviations above the mean.

9.2. Example 2: T-Score Calculation

Given: \(\mu = 50\), \(\bar{x} = 55\), \(s = 8\), \(n = 25\)

\[ t = \frac{55 - 50}{8/\sqrt{25}} = \frac{5}{1.6} = 3.125 \]

\(df = 25 - 1 = 24\)

10. R Implementation

10.1. Normal Distribution Functions

# Probability density
dnorm(x, mean = 0, sd = 1)

# Cumulative probability
pnorm(q, mean = 0, sd = 1)

# Quantile function
qnorm(p, mean = 0, sd = 1)

# Random generation
rnorm(n, mean = 0, sd = 1)

10.2. T-Distribution Functions

# Probability density
dt(x, df)

# Cumulative probability
pt(q, df)

# Quantile function
qt(p, df)

# Random generation
rt(n, df)

10.3. Sample Standard Deviation

sample_sd <- sd(data) # Sample standard deviation

11. Summary

The normal distribution is fundamental in statistics with predictable properties
Z-distribution is used when population parameters are known
T-distribution is used for small samples with unknown population parameters
Understanding distribution shapes (skewness, kurtosis) helps interpret data patterns
These distributions form the basis for many statistical tests in psychological research