import pandas as pd
import io
import httpx
url = "https://raw.githubusercontent.com/simonguest/codercub/main/labs/02/notebooks/ulaanbaatar_aqi_2019_2021.csv"
response = httpx.get(url)
df = pd.read_csv(io.StringIO(response.text))
print(f"Loaded {len(df)} rows and {len(df.columns)} columns")
print(f"Date range: {df['date'].min()} to {df['date'].max()}")Exploring Air Quality
What does a year of pollution in Ulaanbaatar look like?
Ulaanbaatar, the capital of Mongolia, is one of the most polluted cities on Earth, but only at certain times of year.
In this notebook you’ll: 1. Load a real dataset of daily air quality measurements 2. Explore its shape and contents 3. Plot a full year of pollution to uncover the pattern
Your mission: By the end of this notebook, you should be able to answer: “When is the air dangerous — and when is it safe?”
Step 1: Load the data
We’ll use pandas — Python’s most popular data analysis library — to load our dataset.
The dataset contains daily air quality readings from Ulaanbaatar, covering 2019 to 2021. Each row is one day. Each column is one measurement or label.
Step 2: Take a first look
Before we analyse anything, let’s see what the data actually looks like.
df.head() shows the first 5 rows — like opening the first page of a book.
# Show the first 5 rows
df.head()Let’s also check what columns we have and what types they are:
# Column names and data types
df.info()🔍 Look at the columns: - date — the day of the measurement - temp_c — temperature in Celsius - pm25 — fine particulate matter (µg/m³) — the main measure of air pollution - aqi_category — a label describing how dangerous the air is that day
PM2.5 refers to tiny particles smaller than 2.5 micrometres. They’re small enough to enter your lungs and bloodstream. The WHO considers anything above 15 µg/m³ unsafe for daily exposure.
Step 3: Quick summary statistics
df.describe() gives you the min, max, average, and more for every numeric column — a fast way to get a feel for the data.
df.describe().round(1)Questions to think about: - What is the average PM2.5 across all three years? - What is the maximum PM2.5 recorded? How does that compare to the WHO limit of 15 µg/m³? - What is the coldest temperature in the dataset? The warmest?
What did you find in df.describe()? What is the average PM2.5 across all three years? How does the maximum reading compare to the WHO safe limit of 15 µg/m³?
Step 4: Plot a full year of pollution
Now for the main question: what does a whole year of pollution look like?
We’ll use matplotlib to draw a line chart of PM2.5 across every day of 2019.
import matplotlib.pyplot as plt
# Filter to just 2019
df_2019 = df[df['year'] == 2019].copy()
# Convert the date column to a proper datetime type so the x-axis looks nice
df_2019['date'] = pd.to_datetime(df_2019['date'])
# Create the plot
fig, ax = plt.subplots(figsize=(14, 5))
ax.plot(df_2019['date'], df_2019['pm25'], color='steelblue', linewidth=1.2, alpha=0.9)
# Add a horizontal line for the WHO daily safe limit
ax.axhline(y=15, color='green', linestyle='--', linewidth=1.5, label='WHO safe limit (15 µg/m³)')
# Labels and title
ax.set_title('Daily PM2.5 in Ulaanbaatar — 2019', fontsize=15, fontweight='bold', pad=12)
ax.set_xlabel('Date', fontsize=11)
ax.set_ylabel('PM2.5 (µg/m³)', fontsize=11)
ax.legend(fontsize=10)
plt.tight_layout()
plt.show()What shape do you see?
Describe the pattern in your own words before reading on.
Notice the green dashed line. That’s the WHO daily safe limit of 15 µg/m³. How many days in 2019 were below that line?
Describe the shape of the chart in your own words. When is pollution highest? When is it lowest? What do you think might cause this pattern?
Step 5: Add some color
The line chart shows the pattern, but let’s make it easier to read by coloring each day by its AQI category — so we can immediately see which days are dangerous.
# Colour map for AQI categories
aqi_colors = {
'Good': '#00e400',
'Moderate': '#ffff00',
'Unhealthy for Sensitive Groups': '#ff7e00',
'Unhealthy': '#ff0000',
'Very Unhealthy': '#8f3f97',
'Hazardous': '#7e0023'
}
fig, ax = plt.subplots(figsize=(14, 5))
# Plot each day as a coloured vertical bar
for _, row in df_2019.iterrows():
color = aqi_colors.get(row['aqi_category'], 'gray')
ax.bar(row['date'], row['pm25'], color=color, width=1.0, alpha=0.85)
# WHO limit line
ax.axhline(y=15, color='black', linestyle='--', linewidth=1.2, label='WHO safe limit (15 µg/m³)')
# Legend
from matplotlib.patches import Patch
legend_elements = [Patch(facecolor=aqi_colors[cat], label=cat) for cat in aqi_colors]
ax.legend(handles=legend_elements, fontsize=8, loc='upper right')
ax.set_title('Daily PM2.5 in Ulaanbaatar — 2019 (coloured by AQI category)', fontsize=14, fontweight='bold', pad=12)
ax.set_xlabel('Date', fontsize=11)
ax.set_ylabel('PM2.5 (µg/m³)', fontsize=11)
plt.tight_layout()
plt.show()What do you notice now? - Which color dominates January and February? - Are there any green or yellow days in winter? - How long does the “safe” window in summer last?
Look at the coloured chart. Which AQI category dominates January and February? How many roughly safe (green or yellow) days are there in winter? How long does the clean window in summer last?
Step 6: Your turn — try 2020 and 2021
The cells above only showed 2019. Now it’s your turn to explore the other years.
Modify the code below to plot 2020 instead of 2019. Then try 2021.
Does the pattern look the same? Is the pollution getting better or worse over time?
# ✏️ YOUR TURN: change 2019 to 2020 or 2021 and re-run
df_your_year = df[df['year'] == 2019].copy() # <-- change this year
df_your_year['date'] = pd.to_datetime(df_your_year['date'])
fig, ax = plt.subplots(figsize=(14, 5))
ax.plot(df_your_year['date'], df_your_year['pm25'], color='steelblue', linewidth=1.2)
ax.axhline(y=15, color='green', linestyle='--', linewidth=1.5, label='WHO safe limit (15 µg/m³)')
ax.set_title(f"Daily PM2.5 in Ulaanbaatar — {df_your_year['year'].iloc[0]}", fontsize=14, fontweight='bold')
ax.set_xlabel('Date')
ax.set_ylabel('PM2.5 (µg/m³)')
ax.legend()
plt.tight_layout()
plt.show()Step 7: All three years together
Let’s put all three years on the same chart so it’s easy to compare.
df['date'] = pd.to_datetime(df['date'])
fig, axes = plt.subplots(3, 1, figsize=(14, 10), sharex=False, sharey=True)
years = [2019, 2020, 2021]
colors = ['steelblue', 'darkorange', 'seagreen']
for ax, year, color in zip(axes, years, colors):
df_year = df[df['year'] == year]
ax.plot(df_year['date'], df_year['pm25'], color=color, linewidth=1.0)
ax.axhline(y=15, color='gray', linestyle='--', linewidth=1.0)
ax.set_title(f'{year}', fontsize=12, fontweight='bold')
ax.set_ylabel('PM2.5')
mean_val = df_year['pm25'].mean()
ax.text(0.01, 0.92, f'Yearly average: {mean_val:.0f} µg/m³',
transform=ax.transAxes, fontsize=9, color='black',
bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.7))
axes[-1].set_xlabel('Date', fontsize=11)
fig.suptitle('PM2.5 in Ulaanbaatar — 2019 to 2021', fontsize=15, fontweight='bold', y=1.01)
plt.tight_layout()
plt.show()Discussion questions: 1. Is the shape the same in all three years? What does that tell you about the cause? 2. Look at the yearly averages shown on each chart. Is the air quality improving? 3. Can you spot any unusual spikes — days that were far worse than the surrounding days? What might cause those?
Compare the three years. Is the seasonal pattern the same each year? Are the yearly averages getting better or worse? Did you spot any unusual spikes — and what might have caused them?
What you’ve learned
In this notebook you: - Loaded a CSV dataset from the web using pd.read_csv() - Explored its structure with .head(), .info(), and .describe() - Created a line chart and a coloured bar chart using matplotlib - Compared patterns across three years
The pattern you found — extreme winter pollution, clean summers — is one of the most well-documented environmental problems in Central Asia. The cause is coal heating in Ulaanbaatar’s ger districts (traditional tent communities), where hundreds of thousands of families burn coal through -30°C winters with no central heating.
Check Your Understanding
{ “question_type”: “multiple_choice”, “question”: “What does PM2.5 measure?”, “options”: [ { “key”: “a”, “text”: “Air temperature” }, { “key”: “b”, “text”: “Wind speed” }, { “key”: “c”, “text”: “Tiny particles in the air small enough to enter your lungs” }, { “key”: “d”, “text”: “Humidity” } ], “answer”: “c”, “submitted_answer”: “” }
{ “question_type”: “multiple_choice”, “question”: “What does df.describe() show you?”, “options”: [ { “key”: “a”, “text”: “The first 5 rows of the dataset” }, { “key”: “b”, “text”: “Summary statistics like min, max, and average” }, { “key”: “c”, “text”: “The column names and data types” }, { “key”: “d”, “text”: “A chart of the data” } ], “answer”: “b”, “submitted_answer”: “” }
{ “question_type”: “multiple_choice”, “question”: “Based on the charts, when is air pollution in Ulaanbaatar at its worst?”, “options”: [ { “key”: “a”, “text”: “Summer” }, { “key”: “b”, “text”: “Spring” }, { “key”: “c”, “text”: “Autumn” }, { “key”: “d”, “text”: “Winter” } ], “answer”: “d”, “submitted_answer”: “” }
{ “question_type”: “true_false”, “question”: “The WHO considers any PM2.5 reading above 15 µg/m³ unsafe for daily exposure.”, “answer”: “True”, “submitted_answer”: “” }