Aviation Accidents and the Patterns

This IEEE-format paper applies predictive modeling and statistical analysis to nearly 80,000 aviation accident records from the National Transportation Safety Board (NTSB) database, spanning 1962 to 2016. The goal was to uncover decade-by-decade patterns in aviation accidents and quantify the statistical significance of key independent variables: weather, flight schedule, phase of flight, engine type, number of engines, and amateur-built aircraft.

Analysis was conducted in IBM SPSS using factor analysis, correlation analysis, cross-tabulation, and descriptive statistics. Python was used for data preprocessing to identify the decade and season for each accident record.

Paper

aviation_accidents_ieee.pdf Download PDF ↓

Dataset

77,975 Accident records
31 Variables
1962–2016 Time span
NTSB Data source

Key Findings

2.7%
Cruise-phase accidents in the 2010s, down from 16.4% in the 1980s. Cruise flight is significantly safer thanks to modern cockpit technology.
76%
Of fatal aviation accidents involved non-scheduled flights. This pattern held consistently across every decade in the dataset.
22%
Of all fatal injuries were weather-related, despite weather causing only ~7.4% of general aviation accidents — a disproportionate fatality rate.
83%
Of aviation accidents occurred in single-engine piston aircraft. Reciprocating engines accounted for 90.1% of all accident records.

Methods

Data Preprocessing with Python

A Python script was written to parse each accident record, extract the event date, and assign a Decade_Identifier (e.g., 1980, 1990, 2000, 2010) and Season_of_Year. Variables like aircraft category, engine type, weather condition, and phase of flight were recoded into numeric IDs for use in SPSS.

Factor Analysis

Principal Component Analysis with Oblimin rotation was applied to identify clusters of correlated independent variables. Three factors emerged:

  • Factor 1 (Aircraft Make): Amateur_Built, Engine_Type, Number_of_Engines
  • Factor 2 (Injury Severity): Total_Fatal_Injuries, Total_Serious_Injuries, Decade
  • Factor 3 (Weather): Weather_Condition, Broad_Phase_of_Flight

Correlation Analysis

Pearson correlation showed a statistically significant relationship between total fatal injuries and total serious injuries (r = 0.265, p < 0.01). The scatter plot revealed that the two measures track closely when injury counts are low but diverge at higher totals — some accidents produce high fatalities with few serious injuries and vice versa.

Cross-tabulation and Chi-Square

Cross-tabulations were used to examine accident patterns across decades, flight schedules, weather conditions, and engine types. The chi-square test confirmed a statistically significant relationship between weather condition and fatal injuries (χ² = 877.829, p < 0.001).

Phase of Flight Analysis

Takeoff (19.1%) and landing (29.1%) were the most accident-prone phases across the full dataset. In the 2010s, approximately 64% of accidents occurred during landing, approach, maneuvering, or takeoff. The one notable decade-over-decade improvement was cruise-phase safety: cruise accidents fell from 16.4% of accidents in the 1980s to just 2.7% in the 2010s, a direct result of advances in navigation and autopilot systems.

Weather and Fatal Injuries

Only 7.4% of general aviation accidents were weather-related, yet those accidents accounted for 22% of all fatal injuries. The chi-square result (χ² = 877.829) confirms the relationship is not due to chance. Weather-related accidents declined from 8.3% in the 1980s to 5.1% in the 2010s, with amateur-built aircraft showing lower weather-related rates (1.5% in the 2010s) compared to non-amateur-built aircraft (5.7%).

Tech Stack

  • Data source: NTSB Aviation Accident Database (77,975 records, 31 variables)
  • Analysis: IBM SPSS; factor analysis, correlation, cross-tabulation, descriptive statistics
  • Preprocessing: Python; decade and season classification from event dates
  • Format: IEEE conference paper

Reflection

This paper was written during the Big Data track of my Master's program at Washington University in St. Louis. The most surprising finding was the weather disproportionality: weather causes a small fraction of accidents but a much larger share of fatalities, which has direct implications for how safety interventions should be prioritized.

The experience of writing Python preprocessing scripts alongside SPSS statistical workflows reinforced how important clean, well-structured data is before any analysis begins. It also introduced me to factor analysis as a dimensionality reduction technique, which connects directly to the topic modeling work I had done at UIUC a few years earlier.

← Back to Projects