Day 2: Working With Data

Recap

  • What is AI?
  • Is it AI or not?
  • Types of AI: Supervised ML, Computer Vision, and Generative AI
  • Using Notebooks
  • “AI in Action” Notebook

Start With Data!

Start With Data!

  • Traditional software: a programmer writes every rule by hand
  • AI/ML: you feed the system examples, and it finds the patterns itself
  • Think of teaching a child to recognize a dog — not by listing rules, but by showing thousands of pictures
  • Those examples are the data

No Data, No AI

  • Spam filter → millions of emails labelled “spam” or “not spam”
  • Music App recommendations → millions of listening sessions
  • ChatGPT → a large chunk of the entire internet
  • More data + better data = smarter model
  • Today we learn to work with data, because that’s where AI starts

Introducing Pandas

What is Pandas?

  • A Python library for working with structured data — think spreadsheets, tables, CSVs
  • Lets you load, clean, filter, and analyze data with just a few lines of code
  • One of the most widely used tools in data science and AI

A Brief History of Pandas

  • Created by Wes McKinney in 2008 while working at a bank
  • He was frustrated that existing tools made data analysis too slow and painful
  • Released as open source — now downloaded hundreds of millions of times a year
  • The name comes from “Panel Data”, an economics term for multi-dimensional datasets

Key Pandas Concepts

  • DataFrame — a table of data with rows and columns, like a spreadsheet
  • Series — a single column of data
  • Can read CSV files, Excel files, databases, JSON, and more

Demo

The “Using Pandas” Notebook (pandas.ipynb)

Hands-On

The “Using Pandas” Notebook (pandas.ipynb)

Beyond Boring Tables!

Introducing Matplotlib

What is Matplotlib?

  • A Python library for creating charts and visualizations
  • Line charts, bar charts, scatter plots, histograms, heatmaps, and more
  • Turns raw data into pictures — making patterns much easier to spot

A Brief History of Matplotlib

  • Created by John D. Hunter in 2003, originally for neuroscience research
  • He wanted Python to produce charts like MATLAB, without needing MATLAB
  • Hunter died in 2012, but the project continues as a major open-source effort
  • Today it underpins many other visualization libraries (Seaborn, Pandas plots, etc.)

Key Matplotlib Concepts

  • Figure — the entire canvas for your chart
  • Axes — the plot area where your data is actually drawn
  • Usually paired with Pandas: load data with Pandas, visualize it with Matplotlib

Demo

The “Using Matplotlib” Notebook (matplotlib.ipynb)

Hands-on

The “Using Matplotlib” Notebook (matplotlib.ipynb)

Ulaanbaatar’s Air Quality

Ulaanbaatar’s Air Quality

  • Ulaanbaatar regularly records PM2.5 levels 30–40× the WHO safe limit
  • Worse than Beijing, worse than Delhi
  • Let’s find out more…

Video

https://www.youtube.com/watch?v=xySwAUt6ojQ

What Causes This?

Your job today is to figure out why, using pandas and matplotlib!

Demo

Open and run the “Exploring Air Quality” Notebook (air_quality.ipynb)

Hands-On

Open and run the “Exploring Air Quality” Notebook (air_quality.ipynb)

Reflection

What did you learn? Did anything surprise you?

Introducing Correlation

What is Correlation?

  • When two variables tend to move together, we say they are correlated
  • More hours of sleep → better test scores = positive correlation
  • Higher temperature → lower PM2.5 in Ulaanbaatar = negative correlation
  • We measure the strength and direction with a number: the correlation coefficient

The Correlation Coefficient

  • Always falls between -1 and +1
    • +1 = perfect positive relationship (both always move together)
    • 0 = no relationship at all
    • -1 = perfect inverse relationship (one goes up, the other always goes down)
  • But be careful: correlation doesn’t mean causation!
  • Ice cream sales and sunburn rates are both high in summer! Does ice cream cause sunburn?

Demo

Open and run the “Correlation” Notebook (correlation.ipynb)

Hands-On

Open and run the “Correlation” Notebook (correlation.ipynb)

Reflection

What did you learn? Did anything surprise you?

Tomorrow

  • Introduce scikit-learn
  • Train our first AI model!
  • Explore a dataset about Dzud