Click any city bubble to see detailed statistics. Scroll to zoom, drag to pan.
Adjust the weather conditions below to get an instant AQI prediction. Uses a linear regression model trained on 3 years of historical data (R2 = 0.94).
Linear Regression
Surprisingly strong for a linear model. The near-linear relationship between particulate concentrations (PM2.5, PM10) and AQI means linear regression captures most of the variance.
Random Forest
Comparable to linear regression here, confirming the data relationships are largely linear. Would outperform more clearly on a larger, noisier real-world dataset.
Feature Importance
PM2.5 dominates because it is the primary component of AQI by definition. The interesting insight is that weather factors (wind, humidity, temp) together explain only ~1.5% of variance once particulate levels are known.
16,106 daily records across 15 cities
Three years of daily AQI, weather, and pollutant data (2021-2023), with realistic seasonal patterns and monsoon cycles built in. Approximately 2% missing values injected and cleaned as part of the pipeline.
319 outliers removed, 1,359 nulls imputed
AQI values above 500 (physically impossible) were removed. Missing values were forward-filled per city, with median fallback. Data types standardised and date parsing validated.
8 engineered features added
Including: is_weekend, season (hemisphere-aware), is_monsoon per city, temperature range bucket, wind category, high pollution flag, 7-day rolling average AQI, and humidity category.
What I would add with more time
Activate live OpenWeatherMap API integration, add a 7-day AQI forecast using Facebook Prophet, deploy on Streamlit Cloud with a public URL, and incorporate NASA satellite AOD data as a ground truth comparison.