ML Research Project

Airfare Price
Prediction System

End-to-end ML pipeline with XGBoost, SVM & ensemble methods achieving 88% accuracy across 1,814 real flight records.

🚀 Try Live Predictor📊 View Results
97.9%
Prediction Accuracy
0
Flight Records
0
ML Models
0
Features
Live Demo

Flight Price Predictor

Enter flight details to get an AI-powered price estimate using our trained XGBoost model

✈️

Fill in flight details and click Predict Price to get your estimate

💡 Price Insights

Book 30+ days early to save ~15%
Business class is ~2.5× economy
Summer travel adds ~30% premium
Last-minute flights cost ~45% more
🤖

XGBoost (Tuned)

Best performing model

97.9%
Accuracy
0.98
R² Score
±₹587
RMSE
Evaluation

Model Performance

6 ML models compared using 5-fold cross-validation and GridSearchCV optimization

Accuracy Comparison

XGBoost (Tuned) — Metric Radar

88%
Linear Reg
R²:0.88
18%
SVM
R²:0.18
95%
Random Forest
R²:0.95
97%
Gradient Boost
R²:0.97
90%
Ensemble
R²:0.90
98%
XGBoost (Tuned)
R²:0.98
⭐ Best
Insights

Feature Importance

XGBoost feature weights showing the top drivers of airfare pricing

1
Class42.95%
2
Price/Hour21.12%
3
Duration12.11%
4
Days Left9.76%
5
Duration Hrs8.68%
6
Season1.19%
7
Total Stops0.96%
8
Route Pop.0.88%
💼

Class Premium

Business class commands a 2.5× premium. The strongest price driver at 43% importance.

📅

Booking Timing

Last-minute bookings (≤7 days) cost ~45% more. Advance booking (30+ days) saves ~15%.

🌞

Seasonal Surge

Summer travel is 30% pricier than monsoon. Season meaningfully impacts pricing.

Architecture

ML Pipeline

End-to-end ML workflow from raw data to production-ready model

📥
01Data Ingestion

1,814 flight records with 12 raw features across 9 airlines and 9 Indian cities

🧹
02Data Cleaning

12% missing values via median/mode imputation. IQR outlier capping preserves all records

⚙️
03Feature Engineering

12 new features: time-based, booking patterns, flight characteristics, efficiency metrics

📊
04Encoding & Scaling

Label encoding for categoricals. StandardScaler normalization improves model convergence

🤖
05Model Training

6 models compared: Linear Reg, SVM, Random Forest, Gradient Boosting, Ensemble, XGBoost

🎯
06Hyperparameter Tuning

GridSearchCV with 36 XGBoost combinations. 5-fold CV ensures robust evaluation

Technology Stack

Python 3.8+
XGBoost
Scikit-learn
Pandas
NumPy
Matplotlib
Seaborn
SciPy
GridSearchCV
K-Fold CV