Visualizing Data with Matplotlib

Numbers in a table are useful — but charts make patterns much easier to see. In this notebook you’ll learn how to turn data into visualizations using Matplotlib.

What You’ll Learn

How to create line charts to show change over time
How to create bar charts to compare values side by side
How to create scatter plots to explore relationships between two variables

Setting Up

We’ll use the same NYC temperature data from the Pandas notebook. Let’s import both libraries and recreate the DataFrame:

import pandas as pd
import matplotlib.pyplot as plt

data = {
  'month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
         'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
  'avg_temp_c': [0, 1, 6, 12, 18, 23, 26, 25, 21, 15, 9, 3]
}

df = pd.DataFrame(data)
print("DataFrame loaded")

Part 1: Line Charts

A line chart is ideal for showing how something changes over time, such as temperature across the months of a year.

Let’s plot our temperature data:

plt.figure(figsize=(10, 5))
plt.plot(df['month'], df['avg_temp_c'], marker='o', color='steelblue', linewidth=2)
plt.title('Average Monthly Temperature — New York City')
plt.xlabel('Month')
plt.ylabel('Temperature (°C)')
plt.grid(True, linestyle='--', alpha=0.5)
plt.show()

Part 2: Bar Charts

A bar chart is great for comparing values side by side. Let’s show the same data as bars, and color any cold months (below 10°C) differently:

colors = ['steelblue' if t >= 10 else 'slateblue' for t in df['avg_temp_c']]

plt.figure(figsize=(10, 5))
plt.bar(df['month'], df['avg_temp_c'], color=colors)
plt.title('Average Monthly Temperature — New York City')
plt.xlabel('Month')
plt.ylabel('Temperature (°C)')
plt.axhline(y=0, color='black', linewidth=0.8)
plt.show()

Explore Different Cities

Use the dropdown below to pick a different city and see how the temperature pattern changes!

CITY = "Sydney" #@param ["New York", "Sydney", "London", "Nairobi"]

city_data = {
  "New York":  [0,  1,  6, 12, 18, 23, 26, 25, 21, 15,  9,  3],
  "Sydney":    [26, 26, 24, 21, 18, 15, 14, 15, 17, 20, 22, 25],
  "London":    [6,  6,  9, 12, 15, 18, 20, 20, 17, 13,  9,  6],
  "Nairobi":   [22, 23, 23, 22, 21, 19, 18, 19, 21, 22, 21, 21]
}

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
temps = city_data[CITY]

plt.figure(figsize=(10, 5))
plt.plot(months, temps, marker='o', color='coral', linewidth=2)
plt.title(f'Average Monthly Temperature — {CITY}')
plt.xlabel('Month')
plt.ylabel('Temperature (°C)')
plt.grid(True, linestyle='--', alpha=0.5)
plt.show()

What differences did you notice between cities? Which city has the most stable temperature year-round? Why do you think that is?

Part 3: Scatter Plots

A scatter plot puts one variable on the x-axis and another on the y-axis, with one dot per data point. It’s the fastest way to see whether two variables move together.

For example: if students who study more hours tend to score higher, the dots will form an upward slope. If there’s no relationship, the dots will be scattered randomly.

Let’s plot hours studied against exam scores for a class of students:

study_data = {
  'hours_studied': [1, 2, 2, 3, 4, 4, 5, 6, 6, 7, 7, 8, 8, 9, 10],
  'exam_score':    [42, 50, 55, 58, 65, 63, 70, 72, 75, 78, 82, 85, 88, 91, 95]
}
df_study = pd.DataFrame(study_data)

plt.figure(figsize=(8, 5))
plt.scatter(df_study['hours_studied'], df_study['exam_score'],
      color='steelblue', s=80, alpha=0.8)
plt.title('Hours Studied vs. Exam Score')
plt.xlabel('Hours Studied')
plt.ylabel('Exam Score')
plt.grid(True, linestyle='--', alpha=0.4)
plt.show()

Look at the shape of the dots:

Do they slope upward, downward, or are they scattered randomly?
Are the dots tightly grouped, or spread out loosely?
Are there any students who don’t fit the pattern?

Your turn! What if some students studied lots but scored low?

Try changing a few values in exam_score and re-run — what happens to the pattern?

You can also try changing the color, or the dot size (s=80)

study_data = {
  'hours_studied': [1, 2, 2, 3, 4, 4, 5, 6, 6, 7, 7, 8, 8, 9, 10],
  'exam_score':    [42, 50, 55, 58, 65, 63, 70, 72, 75, 78, 82, 85, 88, 91, 95]
}
df_study = pd.DataFrame(study_data)

plt.figure(figsize=(8, 5))
plt.scatter(df_study['hours_studied'], df_study['exam_score'],
      color='steelblue', s=80, alpha=0.8)
plt.title('Hours Studied vs. Exam Score')
plt.xlabel('Hours Studied')
plt.ylabel('Exam Score')
plt.grid(True, linestyle='--', alpha=0.4)
plt.show()

Scatter plots are a key tool in data science for spotting relationships between variables and putting a number on how strong those relationships are.

Describe the scatter plot in your own words. What pattern did you see? What do you think would happen to the dots if studying had no effect on exam scores at all?

Check Your Understanding

{ “question_type”: “multiple_choice”, “question”: “Which chart type is best for showing how a value changes over time?”, “options”: [ { “key”: “a”, “text”: “Line chart” }, { “key”: “b”, “text”: “Bar chart” }, { “key”: “c”, “text”: “Scatter plot” }, { “key”: “d”, “text”: “Histogram” } ], “answer”: “a”, “submitted_answer”: “” }

{ “question_type”: “true_false”, “question”: “Matplotlib is used to load and clean data.”, “answer”: “False”, “submitted_answer”: “” }

{ “question_type”: “multiple_choice”, “question”: “What does a scatter plot show?”, “options”: [ { “key”: “a”, “text”: “How data changes over time” }, { “key”: “b”, “text”: “The total of each category” }, { “key”: “c”, “text”: “The relationship between two variables” }, { “key”: “d”, “text”: “How many values fall in each range” } ], “answer”: “c”, “submitted_answer”: “” }