import httpx
import io
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
url = "https://raw.githubusercontent.com/simonguest/codercub/main/labs/03/notebooks/mongolia_dzud_1990_2013.csv"
response = httpx.get(url)
df = pd.read_csv(io.StringIO(response.text))
print(f"Loaded {len(df)} rows")
df.head()The Dzud — Finding the Cause
In this notebook you will look for the variables that explain livestock mortality. You will start with the most obvious suspect — winter temperature — and discover that it tells only part of the story. Finding the second piece of the puzzle is the core task.
By the end you will have a clear, evidence-backed answer to: what combination of conditions turns a harsh winter into a national disaster?
Step 1: Load the data
Step 2: The primary suspect — winter temperature
Mongolia’s winters are severe by any standard. But some winters are much colder than others. The obvious hypothesis is: colder winters kill more livestock.
Let’s test that with a scatter plot. Each point is one aimag in one year.
Before you run the scatter plot: if colder winters cause more livestock deaths, what shape would you expect the points to form? Do you think all aimags will fit the pattern, or will some be outliers?
fig, ax = plt.subplots(figsize=(9, 6))
ax.scatter(
df["winter_temp_c"],
df["mortality_pct"],
alpha=0.25,
s=14,
color="steelblue"
)
ax.set_xlabel("Winter temperature (°C)", fontsize=12)
ax.set_ylabel("Mortality (%)", fontsize=12)
ax.set_title("Winter temperature vs. livestock mortality — all aimags, 1990–2013",
fontsize=13, fontweight="bold")
plt.tight_layout()
plt.show()Describe what you see in the scatter plot. Is there a clear slope? At the same temperature, why might some aimags lose 10% of their herd while others lose 30%? What does that spread suggest to you?
Step 3: Measure the relationship
In the previous notebook you calculated a correlation coefficient for the air quality data. Let’s do the same here and compare.
corr_temp = df["mortality_pct"].corr(df["winter_temp_c"])
print(f"Correlation between winter temperature and mortality: {corr_temp:.3f}")Compare this value to the -0.85 correlation you found between temperature and PM2.5 in the air quality data.
The relationship here is strong, but something else might also be a factor.
Step 4: Introducing the second variable
Mongolian herders have long observed that a bad summer makes a bad winter worse. When summer pastures are dried out by drought, livestock enter the winter season already weakened. They are thinner, with smaller fat reserves, less able to withstand extreme cold.
The dataset includes a summer_drought_idx column: a value from 0 (no drought) to 1 (severe drought) for the summer preceding each winter.
Let’s check whether this variable also correlates with mortality.
corr_drought = df["mortality_pct"].corr(df["summer_drought_idx"])
print(f"Correlation between summer drought index and mortality: {corr_drought:.3f}")The drought index has a meaningful positive correlation with mortality. If there is higher drought, there is higher mortality.
It is weaker than temperature on its own, but it is measuring something different. The question is whether it adds new information on top of what temperature already tells us — or whether it just duplicates it.
What is your intuition? Do you think dry summers and cold winters tend to happen in the same years, or are they independent events? Why does it matter for building a prediction model?
Step 5: Are the two variables independent?
If cold winters and droughts always happen in the same years, then drought is not adding new information. But if they are largely independent, then a year with both a cold winter AND a summer drought is genuinely more dangerous than a year with just one of those conditions.
Let’s check the correlation between the two predictor variables themselves.
corr_between = df["winter_temp_c"].corr(df["summer_drought_idx"])
print(f"Correlation between winter temperature and summer drought: {corr_between:.3f}")A low correlation between the two means they are measuring largely independent things. Cold winters do not reliably follow dry summers. They are separate climate events that sometimes coincide.
When they do coincide, the combined effect on livestock is much worse than either alone. That is the key insight that researchers took years to establish — and you just found evidence for it in a few lines of code!
Step 6: Coloring the scatter by drought severity
The best way to see the combined effect is to go back to the temperature scatter plot and colour each point by its drought index. If drought is adding real information, you should see a pattern within the scatter: high-drought points clustering toward higher mortality at any given temperature.
fig, ax = plt.subplots(figsize=(10, 6))
scatter = ax.scatter(
df["winter_temp_c"],
df["mortality_pct"],
c=df["summer_drought_idx"],
cmap="YlOrRd",
alpha=0.55,
s=18,
edgecolors="none"
)
plt.colorbar(scatter, ax=ax, label="Summer drought index (0 = none, 1 = severe)")
ax.set_xlabel("Winter temperature (°C)", fontsize=12)
ax.set_ylabel("Mortality (%)", fontsize=12)
ax.set_title("Winter temperature vs. mortality, coloured by summer drought",
fontsize=13, fontweight="bold")
plt.tight_layout()
plt.show()At a given temperature — say, around -22°C — are the darker points (severe drought) sitting higher on the mortality axis than the lighter points (low drought)?
Describe what you observe.
Step 7: Separating drought years from non-drought years
Let’s make that separation explicit. We will split the data into two groups — years with significant drought and years without — and plot them as separate series on the same scatter plot.
# Define a drought threshold
drought_threshold = 0.5
df_drought = df[df["summer_drought_idx"] >= drought_threshold]
df_no_drought = df[df["summer_drought_idx"] < drought_threshold]
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(df_no_drought["winter_temp_c"], df_no_drought["mortality_pct"],
alpha=0.35, s=14, color="steelblue", label="Low drought (index < 0.5)")
ax.scatter(df_drought["winter_temp_c"], df_drought["mortality_pct"],
alpha=0.45, s=14, color="firebrick", label="High drought (index >= 0.5)")
ax.set_xlabel("Winter temperature (°C)", fontsize=12)
ax.set_ylabel("Mortality (%)", fontsize=12)
ax.set_title("The combined effect: cold winters and summer drought",
fontsize=13, fontweight="bold")
ax.legend(fontsize=10)
plt.tight_layout()
plt.show()The red points (high drought years) should sit systematically higher on the mortality axis than the blue points at the same temperature.
This is the mechanism behind Mongolia’s worst dzud events. A cold winter alone is survivable for healthy livestock. A cold winter following a drought — when animals enter winter already weakened — is not.
The two catastrophic years in your data, 2001 and 2010, both involved an unusually cold winter preceded by significant summer drought across many aimags.
At this point you have done what scientists do: identified two independent variables that together explain an outcome better than either one alone. The next notebook will formalise that into a model: one that can take a winter temperature and a drought index as inputs and predict how much of the herd will be lost.
Check Your Understanding
{ “question_type”: “multiple_choice”, “question”: “Winter temperature has a correlation of around -0.65 with livestock mortality. What does this mean?”, “options”: [ { “key”: “a”, “text”: “There is no relationship between the two” }, { “key”: “b”, “text”: “Warmer winters tend to produce higher mortality” }, { “key”: “c”, “text”: “Colder winters tend to produce higher mortality” }, { “key”: “d”, “text”: “Temperature and mortality are perfectly related” } ], “answer”: “c”, “submitted_answer”: “” }
{ “question_type”: “true_false”, “question”: “If two predictor variables are strongly correlated with each other, they both add independent new information to a model.”, “answer”: “False”, “submitted_answer”: “” }
{ “question_type”: “multiple_choice”, “question”: “Why is it useful that winter temperature and summer drought are NOT strongly correlated with each other?”, “options”: [ { “key”: “a”, “text”: “It means neither variable is important on its own” }, { “key”: “b”, “text”: “It means they measure different things and can both improve a prediction model” }, { “key”: “c”, “text”: “It means cold winters and droughts always happen in different years” }, { “key”: “d”, “text”: “It makes the correlation calculation simpler” } ], “answer”: “b”, “submitted_answer”: “” }