Recap
- What is AI?
- Is it AI or not?
- Types of AI: Supervised ML, Computer Vision, and Generative AI
- Using Notebooks
- “AI in Action” Notebook
Start With Data!
- Traditional software: a programmer writes every rule by hand
- AI/ML: you feed the system examples, and it finds the patterns itself
- Think of teaching a child to recognize a dog — not by listing rules, but by showing thousands of pictures
- Those examples are the data
No Data, No AI
- Spam filter → millions of emails labelled “spam” or “not spam”
- Music App recommendations → millions of listening sessions
- ChatGPT → a large chunk of the entire internet
- More data + better data = smarter model
- Today we learn to work with data, because that’s where AI starts
What is Pandas?
- A Python library for working with structured data — think spreadsheets, tables, CSVs
- Lets you load, clean, filter, and analyze data with just a few lines of code
- One of the most widely used tools in data science and AI
A Brief History of Pandas
- Created by Wes McKinney in 2008 while working at a bank
- He was frustrated that existing tools made data analysis too slow and painful
- Released as open source — now downloaded hundreds of millions of times a year
- The name comes from “Panel Data”, an economics term for multi-dimensional datasets
Key Pandas Concepts
- DataFrame — a table of data with rows and columns, like a spreadsheet
- Series — a single column of data
- Can read CSV files, Excel files, databases, JSON, and more
Demo
The “Using Pandas” Notebook (pandas.ipynb)
Hands-On
The “Using Pandas” Notebook (pandas.ipynb)
What is Matplotlib?
- A Python library for creating charts and visualizations
- Line charts, bar charts, scatter plots, histograms, heatmaps, and more
- Turns raw data into pictures — making patterns much easier to spot
A Brief History of Matplotlib
- Created by John D. Hunter in 2003, originally for neuroscience research
- He wanted Python to produce charts like MATLAB, without needing MATLAB
- Hunter died in 2012, but the project continues as a major open-source effort
- Today it underpins many other visualization libraries (Seaborn, Pandas plots, etc.)
Key Matplotlib Concepts
- Figure — the entire canvas for your chart
- Axes — the plot area where your data is actually drawn
- Usually paired with Pandas: load data with Pandas, visualize it with Matplotlib
Demo
The “Using Matplotlib” Notebook (matplotlib.ipynb)
Hands-on
The “Using Matplotlib” Notebook (matplotlib.ipynb)
Ulaanbaatar’s Air Quality
Ulaanbaatar’s Air Quality
- Ulaanbaatar regularly records PM2.5 levels 30–40× the WHO safe limit
- Worse than Beijing, worse than Delhi
- Let’s find out more…
What Causes This?
Your job today is to figure out why, using pandas and matplotlib!
Demo
Open and run the “Exploring Air Quality” Notebook (air_quality.ipynb)
Hands-On
Open and run the “Exploring Air Quality” Notebook (air_quality.ipynb)
Reflection
What did you learn? Did anything surprise you?
What is Correlation?
- When two variables tend to move together, we say they are correlated
- More hours of sleep → better test scores = positive correlation
- Higher temperature → lower PM2.5 in Ulaanbaatar = negative correlation
- We measure the strength and direction with a number: the correlation coefficient
The Correlation Coefficient
- Always falls between -1 and +1
- +1 = perfect positive relationship (both always move together)
- 0 = no relationship at all
- -1 = perfect inverse relationship (one goes up, the other always goes down)
- But be careful: correlation doesn’t mean causation!
- Ice cream sales and sunburn rates are both high in summer! Does ice cream cause sunburn?
Demo
Open and run the “Correlation” Notebook (correlation.ipynb)
Hands-On
Open and run the “Correlation” Notebook (correlation.ipynb)
Reflection
What did you learn? Did anything surprise you?
Tomorrow
- Introduce scikit-learn
- Train our first AI model!
- Explore a dataset about Dzud