Pose Landmarks with Computer Vision

In this notebook, you’ll use a neural network to detect the position of 33 body landmarks in real time. By tracking joints like shoulders, elbows, and wrists, we can understand how a person is moving — without knowing who they are.

What Is Pose Estimation?

Pose estimation detects the position of specific points on a person’s body — called landmarks or keypoints. The result is a skeleton-like overlay showing where joints are located in the image.

This platform uses MediaPipe Pose, which tracks 33 landmarks including: nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles.

Each landmark has: - x, y — position on screen (pixels) - z — estimated depth (how far the point is from the camera) - visibility — how confident the model is that this point is visible (0.0 – 1.0)

Step 1: Live Pose Detection

The cell below starts your camera and draws a skeleton overlay on your body in real time. Step back so your full upper body is visible — the model works best when it can see your shoulders, arms, and head.

Click Allow when your browser asks for webcam access, then click Stop (■) when you’re ready to move on.

import cv
import graphics
import time

canvas = graphics.canvas()
camera = cv.start_camera(canvas)
detector = cv.start_pose_detector(camera)

try:
  while True:
    poses = detector.get_detections()
    canvas.draw_poses(poses)
    time.sleep(0.033)
finally:
  detector.stop()
  camera.stop()
  print("Camera stopped.")

Step 2: Explore Landmark Data

The cell below runs the same live skeleton overlay, but also prints the coordinates of key joints once per second. We access each landmark using cv.POSE constants — for example, cv.POSE.LEFT_WRIST.

Move your arms, lean in, or step back and watch the numbers change.

Tip: In image coordinates, y=0 is the top of the screen and increases downward.

Click Stop (■) when you’re done.

import cv
import graphics
import time

canvas = graphics.canvas()
camera = cv.start_camera(canvas)
detector = cv.start_pose_detector(camera)

key_points = [
  ("Nose",           cv.POSE.NOSE),
  ("Left Shoulder",  cv.POSE.LEFT_SHOULDER),
  ("Right Shoulder", cv.POSE.RIGHT_SHOULDER),
  ("Left Wrist",     cv.POSE.LEFT_WRIST),
  ("Right Wrist",    cv.POSE.RIGHT_WRIST),
]

frame = 0
try:
  while True:
    poses = detector.get_detections()
    canvas.draw_poses(poses)
    if frame % 30 == 0:
      if poses:
        pose = poses[0]
        print("--- Key landmarks ---")
        for name, idx in key_points:
          lm = pose[idx]
          print(f"  {name:16}: x={lm['x']}, y={lm['y']}  (visible: {lm['visibility']:.0%})")
      else:
        print("No pose detected — step back so more of your body is visible.")
    frame += 1
    time.sleep(0.033)
finally:
  detector.stop()
  camera.stop()
  print("Camera stopped.")

Step 3: Experiment — Arm Raise Detector

Now let’s write logic on top of the landmark data. If your wrist is higher than your shoulder, your arm is raised. Since y=0 is the top of the screen, a higher position means a smaller y value.

Run the cell, then try raising one or both arms. Watch the status update in the output. Click Stop (■) when you’re done.

import cv
import graphics
import time

canvas = graphics.canvas()
camera = cv.start_camera(canvas)
detector = cv.start_pose_detector(camera)

frame = 0
try:
  while True:
    poses = detector.get_detections()
    canvas.draw_poses(poses)
    if frame % 30 == 0:
      if poses:
        pose       = poses[0]
        l_wrist    = pose[cv.POSE.LEFT_WRIST]
        l_shoulder = pose[cv.POSE.LEFT_SHOULDER]
        r_wrist    = pose[cv.POSE.RIGHT_WRIST]
        r_shoulder = pose[cv.POSE.RIGHT_SHOULDER]
        left_raised  = l_wrist['y'] < l_shoulder['y']
        right_raised = r_wrist['y'] < r_shoulder['y']
        print(f"Left arm: {'RAISED' if left_raised else 'at side'}  |  "
            f"Right arm: {'RAISED' if right_raised else 'at side'}")
      else:
        print("No pose detected.")
    frame += 1
    time.sleep(0.033)
finally:
  detector.stop()
  camera.stop()
  print("Camera stopped.")

Reflect

Record your observations in the journal cells below.

Imagine you could only see the skeleton data — no video, no face, no identity. What could you still figure out about the person? What couldn’t you tell?

Check for Understanding

{ “question_type”: “true_false”, “question”: “MediaPipe Pose tracks 33 landmarks on the human body.”, “answer”: “True”, “submitted_answer”: “” }

{ “question_type”: “multiple_choice”, “question”: “What does the visibility value tell you about a landmark?”, “options”: [ { “key”: “a”, “text”: “Whether the landmark is currently moving” }, { “key”: “b”, “text”: “How confident the model is that the landmark is visible” }, { “key”: “c”, “text”: “The color of the landmark on screen” }, { “key”: “d”, “text”: “How far the person is from the camera in meters” } ], “answer”: “b”, “submitted_answer”: “” }

{ “question_type”: “true_false”, “question”: “In image coordinates, a smaller y value means the point is higher on the screen.”, “answer”: “True”, “submitted_answer”: “” }

{ “question_type”: “multiple_choice”, “question”: “Which constant would you use to get the position of a person’s left wrist?”, “options”: [ { “key”: “a”, “text”: “cv.POSE.LEFT_HAND” }, { “key”: “b”, “text”: “cv.POSE.LEFT_WRIST” }, { “key”: “c”, “text”: “cv.POSE.WRIST_LEFT” }, { “key”: “d”, “text”: “cv.POSE.LEFT_PALM” } ], “answer”: “b”, “submitted_answer”: “” }