Image-to-Image Transformation

In this notebook, we capture a photo from the camera and use an image model to transform it into a different artistic style. Unlike text-to-image, this model takes an existing image plus a text instruction and uses both to produce the result.

import openai
import os
import base64
from PIL import Image
from io import BytesIO
from IPython.display import display

client = openai.OpenAI(
  base_url='https://openrouter.ai/api/v1',
  api_key=os.environ["OPENROUTER_API_KEY"]
)

def display_image(image_url):
  url = image_url
  if url.startswith("data:"):
    url = url.split(",", 1)[1]
  image_data = base64.b64decode(url)
  image = Image.open(BytesIO(image_data))
  display(image)

Step 1: Capture a Photo

Run the cell below to take a photo using your camera. The countdown gives you 3 seconds to get into position. Point at something interesting — an object, a scene, or yourself.

import cv
import graphics
import time

canvas = graphics.canvas()
camera = cv.start_camera(canvas)

w = canvas.get_width()
h = canvas.get_height()
ctx = canvas.get_context('2d')

for count in ["3", "2", "1"]:
  ctx.fill_style = "rgba(0, 0, 0, 0.5)"
  ctx.fill_rect(0, 0, w, h)
  ctx.font = "bold 200px sans-serif"
  ctx.fill_style = "white"
  ctx.text_align = "center"
  ctx.text_baseline = "middle"
  ctx.fill_text(count, w / 2, h / 2)
  time.sleep(1)
  canvas.clear()

data_url = cv.capture_frame(camera)
camera.stop()
print("Photo captured!")

Step 2: Transform the Image

Now we send the captured photo to an image editing model along with a style instruction. The model uses your photo as a starting point — preserving the content — and renders it in a completely different style.

Choose a transformation from the dropdown and run the cell.

TRANSFORMATION = "Transform this into an oil painting" #@param ["Transform this into an oil painting", "Convert this into watercolor art", "Redraw this in anime style", "Make this look like a pencil sketch", "Transform this into pixel art"]

print(f"Applying: {TRANSFORMATION}...")

response = client.chat.completions.create(
  model="black-forest-labs/flux.2-klein-4b",
  messages=[{
    "role": "user",
    "content": [
      {"type": "image_url", "image_url": {"url": data_url}},
      {"type": "text", "text": TRANSFORMATION}
    ]
  }],
  extra_body={"modalities": ["image"]}
)

transformed_url = response.choices[0].message.images[0]["image_url"]["url"]
display_image(transformed_url)

Compare the original photo with the transformed version.

How well did the model preserve the content of your photo while changing the style? What details changed that you didn’t expect? Which transformation did you find most impressive, and why?

{ “question_type”: “true_false”, “question”: “An image-to-image model takes an existing image as input and can transform it based on a text instruction.”, “answer”: “True”, “submitted_answer”: “” }

{ “question_type”: “multiple_choice”, “question”: “What makes image-to-image generation different from text-to-image generation?”, “options”: [ { “key”: “a”, “text”: “It doesn’t use the diffusion process” }, { “key”: “b”, “text”: “It takes an existing image as additional input alongside the text instruction” }, { “key”: “c”, “text”: “It can only produce black and white output” }, { “key”: “d”, “text”: “It generates images much faster” } ], “answer”: “b”, “submitted_answer”: “” }