Predicting Flight Delays
at Scale

Machine learning analysis of 31M+ flights to predict departure delays using weather, temporal, and historical patterns

31M+
Flights Analyzed
2015-2019
Date Range
F1: 0.44
Best Score

UC Berkeley MIDS W261 - Machine Learning at Scale

Scroll

The $28B Problem

Flight delays cost the U.S. economy billions annually, affecting passengers, airlines, and the broader transportation system.

$28B
Annual Cost

Economic impact on passengers and airlines

20%
Flights Delayed

One in five flights arrives late

45 min
Average Delay

Time lost per delayed flight

Missed Connections

Cascading delays across networks

Hidden Costs

Hotels, meals, rebooking fees

Lost Productivity

Business meetings and deadlines

Operational Strain

Crew scheduling, gate management

Can we predict which flights will be delayed before they depart?

The Data

Combining flight records, weather observations, and station metadata to build a comprehensive prediction model.

Flights

31,746,841

records

  • Period: 2015-2019
  • Fields: 109 columns
  • Target: DEP_DEL15

Weather

~630M

observations

  • Stations: 630 mapped
  • Frequency: Hourly
  • Features: Wind, Vis, Temp

Stations

5,873

weather stations

  • Mapped: 630 to airports
  • Coverage: Continental US
  • Matching: Nearest station
Raw Data
Clean & Join
Feature Engineering
ML Models

Discovery

Exploring patterns in 31 million flights to uncover what drives delays.

Which Airlines Delay Most?

Delay rates vary significantly across carriers. Frontier and JetBlue show the highest delay percentages, while Hawaiian Airlines performs best. This suggests operational factors beyond weather play a significant role.

💡 Low-cost carriers tend to have tighter turnaround times, correlating with higher delay rates.
Flight delays by airline showing percentage of delayed flights for each carrier
Percentage of flights delayed by carrier (2015-2019)

Geographic Patterns

Flight volume concentrates around major hubs: Atlanta, Chicago O'Hare, Dallas-Fort Worth, Denver, and Los Angeles. These hub airports see the most flights but also experience cascading delay effects.

📍 Hub airports create network effects where a single delay can propagate across the system.
Map showing flight volume distribution across US airports
Flight volume by airport location

The Correlation Story

Feature correlation analysis reveals which variables have the strongest relationships with delays. Previous flight delays show the highest correlation—a delayed inbound flight means a delayed outbound.

🔗 Aircraft utilization patterns create delay chains. The same plane serves multiple flights per day.
Correlation heatmap showing relationships between flight delay features
Feature correlation matrix for delay prediction

The Previous Flight Effect

The strongest predictor of delay is whether the previous flight on the same aircraft was delayed. This single feature captures operational dependencies that weather data alone cannot explain.

🎯 DEP_DEL15_PREV becomes our most important feature, achieving 17%+ of total feature importance.
Analysis showing correlation between previous flight delay and current flight delay
Impact of previous flight delay status

Armed with these insights, we designed a prediction pipeline.

The Approach

A scalable machine learning pipeline built on PySpark and TensorFlow.

Solution Architecture

Solution architecture diagram showing data pipeline from raw data through feature engineering to model training and prediction
End-to-end ML pipeline running on Azure Databricks

Feature Engineering

From 35 candidate features, we selected the top 6 based on importance analysis. These features capture 94% of the predictive power while reducing computational complexity.

Selected Features

  1. 1 DEP_DEL15_PREV
  2. 2 CRS_DEP_TIME_bucket
  3. 3 OD_GROUP
  4. 4 wnd_speed
  5. 5 vis_distance
  6. 6 dest_tmp

Feature Insights

DEP_DEL15_PREV

Previous flight delay status—our most powerful predictor

CRS_DEP_TIME_bucket

Scheduled departure time binned into operational periods

OD_GROUP

Origin-destination pair encoding route characteristics

wnd_speed, vis_distance, dest_tmp

Weather conditions at departure and arrival airports

Model Selection

Logistic Regression

Baseline model with L2 regularization. Fast training, interpretable coefficients.

Baseline

Neural Network

Feed-forward network with 3 hidden layers. TensorFlow on GPU for training.

Deep Learning

Data Engineering Pipeline

Data engineering pipeline showing ETL process from raw data sources
ETL pipeline processing 31M+ flight records

Results

Comparing model performance on predicting flight delays 2 hours before departure.

Compare Models

Random Forest
LR NN
0.44
F1 Score
0.72
AUC-ROC
Model F1 Score AUC-ROC Strengths
Logistic Regression 0.41 0.70 Fast, interpretable
Random Forest 0.44 0.72 Best balance
Neural Network 0.49 0.73 Highest AUC

F1 Score Comparison

Logistic Regression
0.41
Random Forest
0.44
Neural Network
0.49
Key Finding

Top 6 features achieve 94% of full model performance

Feature selection reduced training time by 85% while maintaining predictive accuracy.

Top Feature Importance

DEP_DEL15_PREV 17.2%
CRS_DEP_TIME_bucket 14.8%
OD_GROUP 13.1%
wnd_speed 11.5%
vis_distance 10.2%
dest_tmp 8.7%

Understanding the Metrics

F1 Score

Harmonic mean of precision and recall. Balances catching delays (recall) with avoiding false alarms (precision).

AUC-ROC

Area under the receiver operating curve. Measures model's ability to distinguish delayed from on-time flights.

Why These Matter

Imbalanced classes (20% delays) make accuracy misleading. F1 and AUC provide more meaningful evaluation.