What is Descriptive Statistics?
Descriptive statistics are numerical and graphical methods used to summarize and describe the main features of a dataset. They provide simple summaries about the sample and the measures, forming the basis of quantitative data analysis.
Key Components:
- Measures of Central Tendency: Mean, median, mode - describe the center of the data
- Measures of Dispersion: Range, variance, standard deviation - describe the spread of data
- Measures of Distribution: Skewness, kurtosis, quartiles - describe the shape of data distribution
- Graphical Representations: Histograms, box plots, scatter plots - visualize data patterns
Population vs Sample
Population statistics describe entire groups, while sample statistics describe subsets. Different formulas are used for variance and standard deviation.
Sample: s² = Σ(x - x̄)² / (n-1)
Data Types
Different statistical measures are appropriate for different types of data: nominal, ordinal, interval, and ratio scales.
Ordinal: Median, mode
Interval/Ratio: All measures
Statistical Significance
Descriptive statistics form the foundation for inferential statistics, hypothesis testing, and making predictions about populations.
Variance → Hypothesis tests
Distribution → Model selection
Measures of Central Tendency
These measures describe the center or typical value of a dataset.
Mean (Average)
The arithmetic average of all values. Most common measure but sensitive to outliers.
Mean = (2+4+6+8+10)/5 = 30/5 = 6
Median (Middle Value)
The middle value when data is sorted. Robust to outliers and skewed distributions.
Sorted: [1, 3, 5, 7, 9]
Median = 5
Mode (Most Frequent)
The value that appears most frequently. Can have multiple modes or no mode.
Mode = 3 (appears 3 times)
When to Use Each
Mean for symmetric data, median for skewed data or outliers, mode for categorical data or identifying peaks.
Skewed: Use median
Categorical: Use mode
Relationship between mean, median, and mode: In a perfectly normal distribution, mean = median = mode. In right-skewed distributions, mean > median > mode. In left-skewed distributions, mean < median < mode.
Measures of Dispersion
These measures describe how spread out or variable the data is.
Range
The difference between maximum and minimum values. Simple but sensitive to outliers.
Range = 50 - 10 = 40
Variance
Average of squared deviations from the mean. Measures average squared distance from mean.
s² = Σ(x - x̄)² / (n-1) (sample)
Standard Deviation
Square root of variance. In same units as data. Most common measure of spread.
s = √s² (sample)
Std Dev = √25 = 5
Interquartile Range (IQR)
Range of middle 50% of data. Robust to outliers. Q3 - Q1.
IQR = 75 - 25 = 50
• 68% of data within ±1 standard deviation of mean
• 95% of data within ±2 standard deviations of mean
• 99.7% of data within ±3 standard deviations of mean
Distribution Measures
These measures describe the shape and characteristics of data distribution.
Quartiles
Divide data into four equal parts. Q1 (25th percentile), Q2 (median, 50th), Q3 (75th).
25% below Q1
25% between Q1-Q2
25% between Q2-Q3
25% above Q3
Skewness
Measures asymmetry of distribution. Positive = right skew, negative = left skew, zero = symmetric.
Left skew: Mean < Median
Symmetric: Mean = Median
Kurtosis
Measures "tailedness" or peakiness. High kurtosis = heavy tails, low kurtosis = light tails.
Mesokurtic: Normal distribution
Platykurtic: Low peak, light tails
Percentiles
Values below which a given percentage of observations fall. P50 = median, P25 = Q1, P75 = Q3.
90% of values are below
10% of values are above
Box Plot Elements: Visual representation showing minimum, Q1, median, Q3, maximum, and potential outliers. The box shows IQR, whiskers show range, and points show outliers beyond 1.5×IQR.
Real-World Applications
Descriptive statistics are used across numerous fields for data analysis and decision-making:
Business & Finance
- Sales analysis and forecasting
- Financial risk assessment
- Market research analysis
- Quality control metrics
- Customer behavior analysis
Healthcare & Medicine
- Clinical trial results analysis
- Patient vital statistics
- Epidemiological studies
- Treatment effectiveness
- Medical research data
Education & Research
- Test score analysis
- Academic performance tracking
- Survey data summarization
- Research data analysis
- Educational assessment
Science & Engineering
- Experimental data analysis
- Quality assurance testing
- Process control monitoring
- Measurement error analysis
- Scientific research data
Social Sciences
- Demographic data analysis
- Survey research statistics
- Psychological test scoring
- Sociological research data
- Political polling analysis
Sports Analytics
- Player performance statistics
- Team performance analysis
- Game strategy optimization
- Talent scouting metrics
- Injury prevention analysis
Solved Examples
Step-by-step solutions to common descriptive statistics problems:
Practice Problems
Test your understanding with these practice problems:
Solution:
Mean = (15+18+22+22+25+28+30)/7 = 160/7 ≈ 22.86
Median = middle value (4th) = 22
Mode = 22 (appears twice, most frequent)
Mean > Median = Mode, indicating slight right skew.
Solution:
Range = 30 - 10 = 20
Mean = (10+15+20+25+30)/5 = 100/5 = 20
Variance = [(10-20)²+(15-20)²+(20-20)²+(25-20)²+(30-20)²] / (5-1)
= (100+25+0+25+100)/4 = 250/4 = 62.5
Standard Deviation = √62.5 ≈ 7.91
Solution:
10 values: Positions for quartiles at 2.75 and 7.25
Q1 = 15 + 0.75×(18-15) = 15 + 2.25 = 17.25
Q3 = 28 + 0.25×(30-28) = 28 + 0.5 = 28.5
IQR = Q3 - Q1 = 28.5 - 17.25 = 11.25
Outlier boundaries: 17.25 - 1.5×11.25 = 0.375 and 28.5 + 1.5×11.25 = 45.375
No outliers in this dataset.
Solution:
Mean (50) > Median (48) > Mode (45)
This is the pattern for right-skewed (positively skewed) distributions.
The tail on the right side is longer, pulling the mean above the median.
Skewness coefficient would be positive.
Solution:
Standard Deviation = √64 = 8
84 is (100-84)/8 = 16/8 = 2 standard deviations below mean
116 is (116-100)/8 = 16/8 = 2 standard deviations above mean
Using Empirical Rule: 95% of data falls within ±2 standard deviations
Therefore, approximately 95% of data falls between 84 and 116.
How to Calculate Descriptive Statistics Step-by-Step
Follow this systematic approach to perform comprehensive descriptive statistics analysis:
Organize Your Data
Collect and organize your dataset. Sort values in ascending order for easier calculation of median and quartiles.
Sorted: [8, 10, 15, 17, 22]
Calculate Central Tendency
Compute mean (average), median (middle value), and mode (most frequent value).
Median = middle value
Mode = most frequent
Calculate Measures of Spread
Compute range, variance, and standard deviation to understand data dispersion.
Variance = Σ(x - mean)² / (n or n-1)
Std Dev = √Variance
Determine Quartiles
Calculate Q1 (25th percentile), Q2 (median, 50th), Q3 (75th percentile), and IQR.
Q3 = value at 75% position
IQR = Q3 - Q1
Check for Outliers
Identify outliers using IQR method: values below Q1-1.5×IQR or above Q3+1.5×IQR.
Upper bound = Q3 + 1.5×IQR
Interpret Results
Analyze what the statistics tell you about your data's center, spread, and distribution shape.
Use std dev for spread interpretation
Check IQR for middle 50% spread
Pro Tips for Descriptive Statistics
- Always visualize: Create histograms or box plots to complement numerical statistics
- Check assumptions: Different statistics assume different data characteristics
- Consider outliers: Decide whether to include, exclude, or transform outliers based on context
- Use appropriate measures: Mean for symmetric data, median for skewed data
- Report with context: Always report sample size and whether statistics are for sample or population
- Check distribution: Assess normality for parametric tests and confidence intervals
Descriptive Statistics FAQs (Mean, Median, Standard Deviation & More)
Common questions about descriptive statistics, data analysis, and statistical measures.