Structured Output: Mesh Generator

In this example, we use structured output to generate a 3D scene from a text description. The AI returns a structured list of mesh objects that are then rendered in a 3D scene viewer.

import openai
import os
from typing import Optional, Literal
from pydantic import BaseModel

class Mesh(BaseModel):
    type: Literal["Box", "Sphere", "Cylinder"]
    position: list[float] = [0.0, 0.0, 0.0]
    rotation: list[float] = [0.0, 0.0, 0.0]
    color: Optional[str] = None
    width: float = 1.0
    height: float = 1.0
    depth: float = 1.0
    diameter: float = 1.0

class SceneDescription(BaseModel):
    meshes: list[Mesh]

OPENROUTER_API_KEY = os.environ["OPENROUTER_API_KEY"]

client = openai.OpenAI(
  base_url='https://openrouter.ai/api/v1',
  api_key=OPENROUTER_API_KEY
)

MODEL = "openai/gpt-5-nano" #@param ["openai/gpt-5-nano"]

SYSTEM_PROMPT = """
You are a 3D scene designer. Generate scenes using simple 3D mesh primitives.

COORDINATE SYSTEM
- Y is up. The ground plane is at Y=0.
- A Box with height H has its center at Y = H/2 to sit flush on the ground.
- Z increases away from the default camera position (which looks from negative Z).

SHAPES AND PARAMS
- Box: width (X), height (Y), depth (Z). Default 1 each.
- Sphere: diameter. Default 1.
- Cylinder: diameter, height. Default 1 each. Oriented vertically.

COLORS
- RGB Hex code strings: e.g., "#ff0000" for red.
"""

response = client.beta.chat.completions.parse(
  model=MODEL,
  messages=[
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "Generate me a house in the middle of a forest"},
  ],
  response_format=SceneDescription
)

print("Response received")

Let’s take a look at the meshes received by the model:

print(response.choices[0].message.parsed.meshes)

And bring these meshes into a 3D scene…

import scene3d

scene = scene3d.Scene()
scene.set_sky(scene3d.Sky.CLOUDS)
scene.import_meshes(response.choices[0].message.parsed.meshes)

You just used AI to generate a 3D scene from a text description!

What other scene would you want the AI to generate? Why was structured output necessary for this example — what do you think would have gone wrong if the model had returned a plain text description of the scene instead?

{ “question_type”: “freeform”, “question”: “What Python library is used to define the Mesh and SceneDescription classes?”, “answer”: “pydantic”, “submitted_answer”: “” }

{ “question_type”: “multiple_choice”, “question”: “What does using Literal[\"Box\", \"Sphere\", \"Cylinder\"] for the type field in the Mesh class do?”, “options”: [ { “key”: “a”, “text”: “It makes the field optional” }, { “key”: “b”, “text”: “It restricts the field to only those three allowed values” }, { “key”: “c”, “text”: “It converts the value to lowercase” }, { “key”: “d”, “text”: “It sets a default value of "Box"” } ], “answer”: “b”, “submitted_answer”: “” }

{ “question_type”: “true_false”, “question”: “The SceneDescription class contains a list of Mesh objects.”, “answer”: “True”, “submitted_answer”: “” }