The Dzud — Finding the Disasters

Open In Jupyter K-12

In this notebook you will load a dataset covering 24 years of livestock mortality across all 21 Mongolian provinces (aimags) and find the disasters hiding in the data.

By the end you will be able to answer: which years were catastrophic, which aimags were hit hardest, and what does a dzud look like when you visualise it?

Step 1: Load the data

We use httpx to fetch the CSV directly from the web, then read it into pandas via io.StringIO — a way of treating the downloaded text as if it were a file on disk.

import httpx
import io
import pandas as pd
import matplotlib.pyplot as plt

url = "https://raw.githubusercontent.com/simonguest/codercub/main/labs/03/notebooks/mongolia_dzud_1990_2013.csv"
response = httpx.get(url)
df = pd.read_csv(io.StringIO(response.text))

print(f"Loaded {len(df)} rows and {len(df.columns)} columns")
print(f"Years covered: {df['year'].min()} to {df['year'].max()}")
print(f"Aimags: {df['aimag'].nunique()}")
df.head()

Step 2: Understand the structure

This dataset has a different shape from the air quality data you worked with yesterday. Rather than one row per day, there is one row per aimag per year — 21 aimags times 24 years gives 504 rows.

Before plotting anything, take a moment to understand what each column contains.

df.info()
df.describe().round(1)

A few things worth noting:

  • mortality_pct is the key outcome — the percentage of the total herd that died over the winter
  • winter_temp_c is the average temperature across November to February — negative numbers mean cold, and Mongolia winters are very cold
  • summer_drought_idx runs from 0 (no drought) to 1 (severe drought) for the preceding summer
  • livestock_start and livestock_lost give you the raw animal counts if you want to work with absolute numbers

What is the highest mortality percentage recorded in the dataset? What is the lowest?

Step 3: National average mortality over time

Because each year contains 21 rows (one per aimag), we need to aggregate before we can plot a time series.

groupby lets us group all rows that share the same year, then calculate a summary statistic across those groups — in this case, the mean mortality percentage.

national = df.groupby("year")["mortality_pct"].mean().reset_index()
national.columns = ["year", "avg_mortality_pct"]

print(national.to_string(index=False))

Before you run the chart: looking at the table above, which two years stand out as the worst? Write down your prediction before plotting.

fig, ax = plt.subplots(figsize=(13, 5))

ax.plot(national["year"], national["avg_mortality_pct"],
    color="steelblue", linewidth=2, marker="o", markersize=5)

ax.set_xlabel("Year", fontsize=12)
ax.set_ylabel("Average mortality (%)", fontsize=12)
ax.set_title("National average livestock mortality — Mongolia 1990–2013",
       fontsize=13, fontweight="bold")

ax.set_xticks(national["year"])
ax.set_xticklabels(national["year"], rotation=45, ha="right", fontsize=9)

plt.tight_layout()
plt.show()

The two peaks correspond to the dzud catastrophes documented by researchers and the Mongolian government.

The 2000–2002 event and the 2009–2010 event together killed an estimated 20 million animals — roughly a third of Mongolia’s entire national herd at the time.

Did the chart match your prediction? Were there any years that surprised you — either higher or lower than you expected? Describe the shape of the line.

Step 4: Which aimags were hit hardest?

The national average tells you when disasters happened. But Mongolia is large (roughly the size of Western Europe) and conditions vary enormously across its 21 provinces.

Let’s look at average mortality by aimag across the full 24-year period to find which regions are most vulnerable.

from matplotlib.patches import Patch

by_aimag = (df.groupby(["aimag", "region"])["mortality_pct"]
      .mean()
      .round(1)
      .reset_index()
      .sort_values("mortality_pct", ascending=False))

fig, ax = plt.subplots(figsize=(12, 6))

region_palette = {
  "South":   "#E53935",
  "West":    "#FB8C00",
  "Central": "#8E24AA",
  "North":   "#1E88E5",
  "East":    "#43A047"
}

bar_colors = [region_palette[r] for r in by_aimag["region"]]

ax.barh(by_aimag["aimag"], by_aimag["mortality_pct"],
    color=bar_colors, edgecolor="white", linewidth=0.5)

ax.set_xlabel("Average mortality (%) across 1990–2013", fontsize=11)
ax.set_title("Average livestock mortality by aimag", fontsize=13, fontweight="bold")
ax.invert_yaxis()

legend_handles = [Patch(color=c, label=r) for r, c in region_palette.items()]
ax.legend(handles=legend_handles, title="Region", fontsize=9, loc="lower right")

plt.tight_layout()
plt.show()

Which regions dominate the top of the chart? Is there a clear geographic pattern? What might explain why certain parts of Mongolia are more vulnerable?

Step 5: The full picture: A heatmap of every aimag, every year

The line chart shows when disasters happened. The bar chart shows which aimags are most vulnerable on average. But neither shows both dimensions at once.

A heatmap lets us put all 504 data points on a single grid: years along the x-axis, aimags along the y-axis, and colour representing mortality percentage. This gives us the most complete view of the data.

# Pivot the data so rows are aimags and columns are years
heatmap_data = df.pivot(index="aimag", columns="year", values="mortality_pct")

# Sort aimags by their average mortality so the most vulnerable appear at the top
aimag_order = df.groupby("aimag")["mortality_pct"].mean().sort_values(ascending=False).index
heatmap_data = heatmap_data.loc[aimag_order]

fig, ax = plt.subplots(figsize=(16, 8))

im = ax.imshow(heatmap_data.values, aspect="auto", cmap="YlOrRd",
         vmin=0, vmax=35)

ax.set_xticks(range(len(heatmap_data.columns)))
ax.set_xticklabels(heatmap_data.columns, rotation=45, ha="right", fontsize=9)

ax.set_yticks(range(len(heatmap_data.index)))
ax.set_yticklabels(heatmap_data.index, fontsize=9)

ax.set_title("Livestock mortality (%) by aimag and year — darker red means more deaths",
       fontsize=13, fontweight="bold", pad=12)

plt.colorbar(im, ax=ax, label="Mortality (%)", shrink=0.8)
plt.tight_layout()
plt.show()

A few things to look for: - In the catastrophic years (2001, 2010), how many aimags show dark red? Is any aimag spared? - In good years, do the same aimags consistently show lighter colours? - Can you spot any aimag that seems to suffer even in relatively mild years?

Step 6: Zoom in on the worst year

Let’s take a closer look at 2001 — the single worst year in the dataset — and see how mortality varied across aimags.

worst_year = 2001

df_worst = (df[df["year"] == worst_year]
      .sort_values("mortality_pct", ascending=False))

fig, ax = plt.subplots(figsize=(12, 6))

ax.bar(df_worst["aimag"], df_worst["mortality_pct"],
     color="firebrick", edgecolor="white", linewidth=0.5)

ax.axhline(y=df_worst["mortality_pct"].mean(), color="black",
       linestyle="--", linewidth=1.2,
       label=f"National average: {df_worst['mortality_pct'].mean():.1f}%")

ax.set_ylabel("Mortality (%)", fontsize=11)
ax.set_title(f"Livestock mortality by aimag — {worst_year}",
       fontsize=13, fontweight="bold")
ax.set_xticklabels(df_worst["aimag"], rotation=45, ha="right", fontsize=9)
ax.legend(fontsize=10)

plt.tight_layout()
plt.show()

Even in the worst year, there is variation between aimags. Some are well above the national average; others are below it.

Based on everything you have seen today, what is your hypothesis? What do you think causes some years to become disasters while others don’t, and why do some aimags suffer more than others?

What you have done in this notebook

You loaded a multi-dimensional dataset (21 aimags across 24 years) and navigated its structure using groupby aggregation. You built four different visualizations that each reveal something different about the same data: when disasters happened, which regions are most vulnerable, the full pattern across all aimags and years simultaneously, and the variation within a single catastrophic year.

Check Your Understanding

{ “question_type”: “multiple_choice”, “question”: “What does groupby allow you to do with a DataFrame?”, “options”: [ { “key”: “a”, “text”: “Filter rows that match a condition” }, { “key”: “b”, “text”: “Aggregate data by groups and calculate summaries” }, { “key”: “c”, “text”: “Sort the rows in a new order” }, { “key”: “d”, “text”: “Add new columns to the DataFrame” } ], “answer”: “b”, “submitted_answer”: “” }

{ “question_type”: “multiple_choice”, “question”: “In the heatmap, what does a darker red cell represent?”, “options”: [ { “key”: “a”, “text”: “A colder winter temperature” }, { “key”: “b”, “text”: “A higher livestock mortality rate” }, { “key”: “c”, “text”: “A lower mortality rate” }, { “key”: “d”, “text”: “More rainfall that year” } ], “answer”: “b”, “submitted_answer”: “” }

{ “question_type”: “true_false”, “question”: “In 2001, every aimag in Mongolia experienced exactly the same livestock mortality rate.”, “answer”: “False”, “submitted_answer”: “” }