Predicting Flight Delays
at Scale

Machine learning analysis of 31M+ flights to predict departure delays using weather, temporal, and historical patterns

31M+

Flights Analyzed

2015-2019

Date Range

F1: 0.44

Best Score

UC Berkeley MIDS W261 - Machine Learning at Scale

Scroll

The $28B Problem

Flight delays cost the U.S. economy billions annually, affecting passengers, airlines, and the broader transportation system.

$28B

Annual Cost

Economic impact on passengers and airlines

20%

Flights Delayed

One in five flights arrives late

45 min

Average Delay

Time lost per delayed flight

Missed Connections

Cascading delays across networks

Hidden Costs

Hotels, meals, rebooking fees

Lost Productivity

Business meetings and deadlines

Operational Strain

Crew scheduling, gate management

Can we predict which flights will be delayed before they depart?

The Data

Combining flight records, weather observations, and station metadata to build a comprehensive prediction model.

Flights

31,746,841

records

Period: 2015-2019
Fields: 109 columns
Target: DEP_DEL15

Weather

~630M

observations

Stations: 630 mapped
Frequency: Hourly
Features: Wind, Vis, Temp

Stations

5,873

weather stations

Mapped: 630 to airports
Coverage: Continental US
Matching: Nearest station

Raw Data

Clean & Join

Feature Engineering

ML Models

Discovery

Exploring patterns in 31 million flights to uncover what drives delays.

Which Airlines Delay Most?

Delay rates vary significantly across carriers. Frontier and JetBlue show the highest delay percentages, while Hawaiian Airlines performs best. This suggests operational factors beyond weather play a significant role.

💡 Low-cost carriers tend to have tighter turnaround times, correlating with higher delay rates.

Flight delays by airline showing percentage of delayed flights for each carrier — Percentage of flights delayed by carrier (2015-2019)

Geographic Patterns

Flight volume concentrates around major hubs: Atlanta, Chicago O'Hare, Dallas-Fort Worth, Denver, and Los Angeles. These hub airports see the most flights but also experience cascading delay effects.

📍 Hub airports create network effects where a single delay can propagate across the system.

Map showing flight volume distribution across US airports — Flight volume by airport location

The Correlation Story

Feature correlation analysis reveals which variables have the strongest relationships with delays. Previous flight delays show the highest correlation—a delayed inbound flight means a delayed outbound.

🔗 Aircraft utilization patterns create delay chains. The same plane serves multiple flights per day.

Correlation heatmap showing relationships between flight delay features — Feature correlation matrix for delay prediction

The Previous Flight Effect

The strongest predictor of delay is whether the previous flight on the same aircraft was delayed. This single feature captures operational dependencies that weather data alone cannot explain.

🎯 DEP_DEL15_PREV becomes our most important feature, achieving 17%+ of total feature importance.

Analysis showing correlation between previous flight delay and current flight delay — Impact of previous flight delay status

Armed with these insights, we designed a prediction pipeline.

The Approach

A scalable machine learning pipeline built on PySpark and TensorFlow.

Solution Architecture

Feature Engineering

From 35 candidate features, we selected the top 6 based on importance analysis. These features capture 94% of the predictive power while reducing computational complexity.

Selected Features

1 DEP_DEL15_PREV
2 CRS_DEP_TIME_bucket
3 OD_GROUP
4 wnd_speed
5 vis_distance
6 dest_tmp

Feature Insights

DEP_DEL15_PREV

Previous flight delay status—our most powerful predictor

CRS_DEP_TIME_bucket

Scheduled departure time binned into operational periods

OD_GROUP

Origin-destination pair encoding route characteristics

wnd_speed, vis_distance, dest_tmp

Weather conditions at departure and arrival airports

Model Selection

Logistic Regression

Baseline model with L2 regularization. Fast training, interpretable coefficients.

Baseline

Random Forest

Ensemble of 100 trees with max depth 10. Best balance of performance and speed.

Best F1

Neural Network

Feed-forward network with 3 hidden layers. TensorFlow on GPU for training.

Deep Learning

Data Engineering Pipeline

Results

Comparing model performance on predicting flight delays 2 hours before departure.

Compare Models

Random Forest

LR NN

0.44

F1 Score

0.72

AUC-ROC

Model	F1 Score	AUC-ROC	Strengths
Logistic Regression	0.41	0.70	Fast, interpretable
Random Forest	0.44	0.72	Best balance
Neural Network	0.49	0.73	Highest AUC

F1 Score Comparison

Logistic Regression

0.41

Random Forest

0.44

Neural Network

0.49

Key Finding

Top 6 features achieve 94% of full model performance

Feature selection reduced training time by 85% while maintaining predictive accuracy.

Top Feature Importance

DEP_DEL15_PREV 17.2%

CRS_DEP_TIME_bucket 14.8%

OD_GROUP 13.1%

wnd_speed 11.5%

vis_distance 10.2%

dest_tmp 8.7%

Understanding the Metrics

F1 Score

Harmonic mean of precision and recall. Balances catching delays (recall) with avoiding false alarms (precision).

AUC-ROC

Area under the receiver operating curve. Measures model's ability to distinguish delayed from on-time flights.

Why These Matter

Imbalanced classes (20% delays) make accuracy misleading. F1 and AUC provide more meaningful evaluation.

Predicting Flight Delays at Scale

The $28B Problem

Missed Connections

Hidden Costs

Lost Productivity

Operational Strain

The Data

Flights

Weather

Stations

Discovery

Which Airlines Delay Most?

Geographic Patterns

The Correlation Story

The Previous Flight Effect

The Approach

Solution Architecture

Feature Engineering

Selected Features

Feature Insights

Model Selection

Logistic Regression

Random Forest

Neural Network

Data Engineering Pipeline

Results

Compare Models

F1 Score Comparison

Top Feature Importance

Understanding the Metrics

F1 Score

AUC-ROC

Why These Matter

Predicting Flight Delays
at Scale