response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You help students generate citations for their coursework."},
{"role": "user", "content": "Can you cite three academic papers on transfomer efficiency. I'm looking for the author names, journals, years, and titles."},
],
)
response.choices[0].message.content'Here are three academic papers on transformer efficiency:\n\n1. **"Attention is a Limited Resource: The Case for Unique Attention Mechanisms in Deep Neural Networks"**\n\nAuthor: Vaswani et al.\n\nJournal: Nature\n\nYear: 2017\n\nTitle: Attention is a Limited Resource: The Case for Unique Attention Mechanisms in Deep Neural Networks\n\nThis paper proposes a new approach to attention mechanisms in deep neural networks, which suggests that attention is not just a simple linear scaling of the input features, but rather a complex mechanism that involves multiple stages and attention weights. The authors demonstrate that this approach can improve the performance of deep neural networks on various tasks.\n\n2. **"Attention is All You Need"**\n\nAuthor: Vaswani et al.\n\nJournal: Nature\n\nYear: 2017\n\nTitle: Attention is All You Need\n\nThis paper presents a new approach to attention mechanisms in deep neural networks, which suggests that attention is not just a simple linear scaling of the input features, but rather a complex mechanism that involves multiple stages and attention weights. The authors demonstrate that this approach can improve the performance of deep neural networks on various tasks.\n\n3. **"Attention is a Key to Efficient Neural Network Training"**\n\nAuthor: He et al.\n\nJournal: Proceedings of the 32nd International Conference on Machine Learning (ICML)\n\nYear: 2015\n\nTitle: Attention is a Key to Efficient Neural Network Training\n\nThis paper presents a new approach to attention mechanisms in deep neural networks, which suggests that attention is a key factor in improving the efficiency of neural network training. The authors demonstrate that attention mechanisms can significantly improve the training speed and accuracy of deep neural networks on various tasks.\n\nNote: These papers are just a few examples of the many papers that have explored transformer efficiency.'
response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You help provide information about software applications."},
{"role": "user", "content": "Tell me more about FilterGenie, software for the Mac that allows you to apply dynamic filters to images."},
],
)
response.choices[0].message.content"FilterGenie is a powerful image editing software for Mac that allows you to apply dynamic filters to images. It's a popular tool among photographers, graphic designers, and anyone looking to enhance and transform their images with ease.\n\nHere are some key features and benefits of FilterGenie:\n\n**Key Features:**\n\n1. **Dynamic Filters**: FilterGenie offers a wide range of dynamic filters that can be applied to images, including:\n\t* Color correction and adjustment\n\t* Noise reduction and sharpening\n\t* Texture and pattern overlays\n\t* Effects like blur, vignette, and glow\n\t* And many more!\n2. **Adjustable Parameters**: Each filter has adjustable parameters, allowing you to fine-tune the effect to your liking.\n3. **Support for Multiple Image Files**: FilterGenie supports a wide range of image file formats, including JPEG, PNG, and TIFF.\n4. **Batch Processing**: You can apply filters to multiple images at once, making it a great tool for editing multiple images in a single session.\n5. **Export Options**: FilterGenie allows you to export your edited images in various formats, including JPEG, PNG, and PSD.\n\n**Benefits:**\n\n1. **Easy to Use**: FilterGenie's intuitive interface makes it easy to apply filters to images, even for those without extensive image editing experience.\n2. **Versatile**: The software offers a wide range of filters to suit various image editing needs, from basic adjustments to more advanced effects.\n3. **High-Quality Results**: FilterGenie's filters are designed to produce high-quality results, ensuring your images look professional and polished.\n4. **Cross-Platform Compatibility**: FilterGenie is available for Mac, Windows, and Linux, making it a great option for users with multiple operating systems.\n\n**System Requirements:**\n\nTo run FilterGenie, you'll need:\n\n* Mac with macOS 10.12 or later\n* 2 GB RAM or more\n* 2 GB free disk space\n* A compatible graphics card (Intel Core i5 or later recommended)\n\n**Pricing:**\n\nFilterGenie offers a free trial version, as well as a one-time purchase option for the full version. The full version costs around $49.99 USD.\n\nOverall, FilterGenie is a powerful and versatile image editing software that's perfect for anyone looking to enhance and transform their images with dynamic filters."
response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You help users summarize documents and published papers."},
{"role": "user", "content": "Can you summarize the key findings of the 2019 WHO report on urban air quality in South Asia."},
],
)
response.choices[0].message.content"I couldn't find a specific 2019 WHO report on urban air quality in South Asia. However, I can provide some general information on the topic and highlight some key findings from previous reports.\n\nThe World Health Organization (WHO) has published several reports on air quality in South Asia, including:\n\n1. **2018 WHO Air Quality Guidelines**: In 2018, WHO published guidelines for air quality in urban areas, which emphasized the importance of reducing air pollution from domestic and industrial sources. The guidelines recommended that cities implement measures to reduce particulate matter (PM2.5) and other pollutants.\n2. **2019 WHO Air Quality Report**: In 2019, WHO published a report on air quality in South Asia, which highlighted the region's vulnerability to air pollution. The report noted that South Asia is home to some of the world's most populous cities, including Delhi, Mumbai, and Dhaka, which are among the most polluted cities in the world.\n3. **2019 WHO Air Quality Report: South Asia**: This report focused on the health impacts of air pollution in South Asia, including the estimated number of premature deaths and hospitalizations due to air pollution. The report also highlighted the need for cities to implement effective air quality management strategies to reduce pollution.\n\nSome key findings from these reports include:\n\n* **Air pollution is a major public health concern in South Asia**: The region is home to some of the world's most polluted cities, with PM2.5 levels exceeding the WHO's recommended limits in many areas.\n* **Air pollution affects vulnerable populations**: The report highlighted the disproportionate impact of air pollution on vulnerable populations, including children, the elderly, and people with pre-existing respiratory conditions.\n* **Air pollution is linked to increased mortality**: The report estimated that air pollution is responsible for an estimated 1.8 million premature deaths per year in South Asia, primarily due to respiratory diseases such as bronchitis and lung cancer.\n* **Air pollution affects economic productivity**: The report noted that air pollution can have significant economic impacts, including reduced economic productivity and increased healthcare costs.\n\nSome specific data points from the 2019 WHO Air Quality Report include:\n\n* Delhi, India: PM2.5 levels exceeded the WHO's recommended limit in 2019, with an average annual concentration of 44.4 μg/m3.\n* Mumbai, India: PM2.5 levels exceeded the WHO's recommended limit in 2019, with an average annual concentration of 35.4 μg/m3.\n* Dhaka, Bangladesh: PM2.5 levels exceeded the WHO's recommended limit in 2019, with an average annual concentration of 23.4 μg/m3.\n\nOverall, the 2019 WHO Air Quality Report highlights the urgent need for cities in South Asia to take action to reduce air pollution and protect public health."
<think> and </think> tokens (or just a single </think> token to divide the two)[{'generated_text': [{'role': 'user', 'content': 'Who are you?'},
{'role': 'assistant',
'content': "Okay, the user asked, Who are you? I need to provide a clear and concise response. First, I should state my identity as Qwen, the large language model developed by Tongyi Lab. Next, I should mention my capabilities, such as answering questions, generating text, coding, and logical reasoning. It's important to highlight that I can handle multiple languages and is designed to be helpful and friendly. I should avoid technical jargon to keep it accessible. Also, I need to check if there's any specific context the user might have in mind, but since the question is straightforward, a general response should suffice. Let me make sure the response is structured well: start with who I am, then key features, and a friendly closing. Let me see... the user might be new to me, so keeping it simple and welcoming is key. Alright, time to put it all together in a natural, conversational tone.\n</think>\n\nI'm **Qwen**, a large language model developed by Tongyi Lab. I'm here to help with a wide range of tasks, including answering questions, writing stories, emails, scripts, logical reasoning, coding, and more. I can also express opinions and play games! Whether you need assistance with a problem or just want"}]}]
Test out different prompts in the hallucinations notebook (hallucinations.ipynb)
Try different prompts with Qwen 4B Thinking (thinking.ipynb)
(You may have to increase max_new_tokens in the pipeline constructor)
“Tell me more about FilterGenie…”
Okay, the user is asking about FilterGenie, a macOS software for applying dynamic filters to images. Hmm, I need to recall what I know about this. Wait, I think FilterGenie might be a bit misleading here. Let me think carefully.\n\nFirst, I remember that there\'s a popular app called FilterGenie, but it\'s actually a Windows-only application. The user specifically mentioned it\'s for Mac. That\'s a red flag. I should double-check this to avoid giving wrong information. \n\nLet me mentally scan through common image editing apps for macOS. There\'s Lightroom, Affinity Photo, Photoshop, but those aren\'t called FilterGenie. Oh! I think there\'s a confusion with "FilterGenie" versus "FilterGenie" - wait, no. Actually, I recall that FilterGenie is a tool that\'s primarily for Windows. Maybe the user heard about it from a source that mixed up the platforms?\n\nI should consider if there\'s any Mac app with a similar name. Let me think... There\'s "Filter Magic" for Mac, but that\'s different. Or "Snapseed" by Google, but that\'s not it either. Hmm. \n\nWait a minute - I think the user might be mixing up FilterGen"I don't have access to specific course details like instructors for CS-394, as that information is typically managed by educational institutions. To find out who teaches CS-394, you could:\n\n1. Check your university/department's official course schedule or syllabus. \n2. Contact the course instructor or department directly via email or phone. \n3. Use your school's learning management system (e.g., Moodle, Canvas) if the course is listed there. \n\nLet me know if you'd help drafting a message to inquire!\n"
import sqlite3
import os
def create_and_populate_db(db_path=DB_PATH):
db_exists = os.path.exists(db_path)
conn = sqlite3.connect(db_path)
conn.row_factory = sqlite3.Row
cursor = conn.cursor()
if db_exists:
print(f"Found existing database at '{db_path}'. Skipping creation.")
return conn
print(f"Creating new database at '{db_path}'...")
# --- Schema ---
cursor.executescript("""
CREATE TABLE instructors (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
email TEXT,
department TEXT,
bio TEXT
);
CREATE TABLE courses (
id INTEGER PRIMARY KEY,
code TEXT NOT NULL UNIQUE,
title TEXT NOT NULL,
description TEXT,
credits INTEGER,
level TEXT CHECK(level IN ('intro', 'intermediate', 'advanced')),
instructor_id INTEGER REFERENCES instructors(id),
max_enrollment INTEGER
);
CREATE TABLE schedules (
id INTEGER PRIMARY KEY,
course_id INTEGER REFERENCES courses(id),
days TEXT,
time_start TEXT,
time_end TEXT,
room TEXT,
semester TEXT
);
""")
# --- Instructors ---
cursor.executemany(
"INSERT INTO instructors (name, email, department, bio) VALUES (?, ?, ?, ?)",
[
("Dr. Sarah Chen", "s.chen@university.edu", "Computer Science",
"Specializes in machine learning and NLP. Author of 'Practical Deep Learning'."),
("Prof. Marcus Webb", "m.webb@university.edu", "Computer Science",
"Focuses on systems programming and computer architecture."),
("Dr. Priya Nair", "p.nair@university.edu", "Data Science",
"Expert in statistical learning, data visualization, and reproducible research."),
("Prof. James Okafor", "j.okafor@university.edu", "Computer Science",
"Teaches software engineering and has 10 years of industry experience at major tech firms."),
("Dr. Elena Russo", "e.russo@university.edu", "Mathematics",
"Research interests include linear algebra, optimization, and mathematical foundations of AI."),
]
)
# --- Courses ---
cursor.executemany(
"""INSERT INTO courses (code, title, description, credits, level, instructor_id, max_enrollment)
VALUES (?, ?, ?, ?, ?, ?, ?)""",
[
("CS-101", "Introduction to Programming",
"Fundamentals of programming using Python. Covers variables, control flow, functions, and basic data structures.",
3, "intro", 2, 40),
("CS-201", "Data Structures and Algorithms",
"Core data structures including linked lists, trees, graphs, and hash tables. Algorithm design and complexity analysis.",
3, "intermediate", 2, 35),
("CS-301", "Machine Learning Fundamentals",
"Supervised and unsupervised learning, model evaluation, feature engineering, and scikit-learn. Final project required.",
4, "intermediate", 1, 30),
("CS-394", "How Generative AI Works",
"Transformer architectures, large language models, prompt engineering, fine-tuning, and deployment. Hands-on with open-source models.",
3, "advanced", 1, 25),
("CS-310", "Database Systems",
"Relational database design, SQL, transactions, indexing, and an introduction to NoSQL systems.",
3, "intermediate", 2, 20)
]
)
# --- Schedules ---
cursor.executemany(
"""INSERT INTO schedules (course_id, days, time_start, time_end, room, semester)
VALUES (?, ?, ?, ?, ?, ?)""",
[
(1, "Mon/Wed/Fri", "09:00", "09:50", "Room 101", "Spring 2026"),
(2, "Tue/Thu", "10:00", "11:20", "Room 204", "Spring 2026"),
(3, "Mon/Wed", "13:00", "14:20", "Lab 12", "Spring 2026"),
(4, "Tue/Thu", "14:00", "15:20", "Lab 12", "Spring 2026"),
(5, "Mon/Wed/Fri", "11:00", "11:50", "Room 305", "Spring 2026"),
]
)
conn.commit()
print("Database created and populated successfully.")
print(f" - {cursor.execute('SELECT COUNT(*) FROM instructors').fetchone()[0]} instructors")
print(f" - {cursor.execute('SELECT COUNT(*) FROM courses').fetchone()[0]} courses")
print(f" - {cursor.execute('SELECT COUNT(*) FROM schedules').fetchone()[0]} schedules")
return conn
create_and_populate_db()import sqlite3
import re
def get_course_by_code(conn, code):
"""Return full details for a single course by its code (e.g. 'CS-394')."""
return conn.execute("""
SELECT
c.code,
c.title,
c.credits,
c.level,
c.description,
c.max_enrollment,
i.name AS instructor,
i.email AS instructor_email,
i.bio AS instructor_bio,
s.days,
s.time_start,
s.time_end,
s.room,
s.semester
FROM courses c
JOIN instructors i ON c.instructor_id = i.id
JOIN schedules s ON s.course_id = c.id
WHERE UPPER(c.code) = UPPER(?)
""", (code,)).fetchone()
def rows_to_text(rows):
"""Convert sqlite3.Row results into a readable string block."""
if not rows:
return "No results found."
if isinstance(rows, sqlite3.Row):
rows = [rows]
return "\n".join(
" " + ", ".join(f"{k}: {row[k]}" for k in row.keys())
for row in rows
)
def build_context(conn, user_query):
"""
Extract a course code from the user query and retrieve its details.
Returns a formatted string ready to inject into the system prompt.
"""
code_match = re.search(r'\b([A-Z]{2,4}-\d{3})\b', user_query.upper())
if not code_match:
return "No course code found in query. Please include a course code (e.g. CS-394)."
code = code_match.group(1)
course = get_course_by_code(conn, code)
if not course:
return f"No course found with code {code}."
return (
f"Course details for {code}:\n{rows_to_text(course)}\n\n"
)# Connect to the sqlite database
conn = create_and_populate_db()
# Query the database based on the user prompt
context = build_context(conn, USER_PROMPT)
# Create system prompt
SYSTEM_PROMPT = f"""
You help students lookup course information. Here are the course details:
---COURSE INFORMATION---
{context}
---END COURSE INFORMATION---
"""
# Query the model
response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": USER_PROMPT},
],
)
response.choices[0].message.contentFound existing database at '.data/course_catalog.db'. Skipping creation.
'The instructor for CS-394 "How Generative AI Works" is **Dr. Sarah Chen**.\n'
import os
import numpy as np
def create_connection(db_path=DB_PATH):
conn = sqlite3.connect(db_path)
conn.row_factory = sqlite3.Row
conn.enable_load_extension(True)
sqlite_vec.load(conn)
conn.enable_load_extension(False)
return conn
def create_and_populate_db(db_path=DB_PATH):
db_exists = os.path.exists(db_path)
conn = create_connection(db_path)
cursor = conn.cursor()
if db_exists:
print(f"Found existing database at '{db_path}'. Skipping creation.")
return conn
print(f"Creating new database at '{db_path}'...")
# --- Schema ---
cursor.executescript("""
CREATE TABLE instructors (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
email TEXT,
department TEXT,
bio TEXT
);
CREATE TABLE courses (
id INTEGER PRIMARY KEY,
code TEXT NOT NULL UNIQUE,
title TEXT NOT NULL,
description TEXT,
credits INTEGER,
level TEXT CHECK(level IN ('intro', 'intermediate', 'advanced')),
instructor_id INTEGER REFERENCES instructors(id),
max_enrollment INTEGER
);
CREATE TABLE schedules (
id INTEGER PRIMARY KEY,
course_id INTEGER REFERENCES courses(id),
days TEXT,
time_start TEXT,
time_end TEXT,
room TEXT,
semester TEXT
);
CREATE VIRTUAL TABLE course_embeddings USING vec0(
course_id INTEGER PRIMARY KEY,
embedding FLOAT[384]
);
""")
# --- Instructors ---
cursor.executemany(
"INSERT INTO instructors (name, email, department, bio) VALUES (?, ?, ?, ?)",
[
("Dr. Sarah Chen", "s.chen@university.edu", "Computer Science",
"Specializes in machine learning and NLP. Author of 'Practical Deep Learning'."),
("Prof. Marcus Webb", "m.webb@university.edu", "Computer Science",
"Focuses on systems programming and computer architecture."),
("Dr. Priya Nair", "p.nair@university.edu", "Data Science",
"Expert in statistical learning, data visualization, and reproducible research."),
("Prof. James Okafor", "j.okafor@university.edu", "Computer Science",
"Teaches software engineering and has 10 years of industry experience at major tech firms."),
("Dr. Elena Russo", "e.russo@university.edu", "Mathematics",
"Research interests include linear algebra, optimization, and mathematical foundations of AI."),
]
)
# --- Courses ---
cursor.executemany(
"""INSERT INTO courses (code, title, description, credits, level, instructor_id, max_enrollment)
VALUES (?, ?, ?, ?, ?, ?, ?)""",
[
("CS-101", "Introduction to Programming",
"Fundamentals of programming using Python. Covers variables, control flow, functions, and basic data structures.",
3, "intro", 2, 40),
("CS-201", "Data Structures and Algorithms",
"Core data structures including linked lists, trees, graphs, and hash tables. Algorithm design and complexity analysis.",
3, "intermediate", 2, 35),
("CS-301", "Machine Learning Fundamentals",
"Supervised and unsupervised learning, model evaluation, feature engineering, and scikit-learn. Final project required.",
4, "intermediate", 1, 30),
("CS-394", "How Generative AI Works",
"Transformer architectures, large language models, prompt engineering, fine-tuning, and deployment. Hands-on with open-source models.",
3, "advanced", 1, 25),
("CS-310", "Database Systems",
"Relational database design, SQL, transactions, indexing, and an introduction to NoSQL systems.",
3, "intermediate", 4, 20),
]
)
# --- Schedules ---
cursor.executemany(
"""INSERT INTO schedules (course_id, days, time_start, time_end, room, semester)
VALUES (?, ?, ?, ?, ?, ?)""",
[
(1, "Mon/Wed/Fri", "09:00", "09:50", "Room 101", "Spring 2026"),
(2, "Tue/Thu", "10:00", "11:20", "Room 204", "Spring 2026"),
(3, "Mon/Wed", "13:00", "14:20", "Lab 12", "Spring 2026"),
(4, "Tue/Thu", "14:00", "15:20", "Lab 12", "Spring 2026"),
(5, "Mon/Wed/Fri", "11:00", "11:50", "Room 305", "Spring 2026"),
]
)
conn.commit()
# --- Embeddings ---
# Embed each course as "title: description" so the vector captures both
print("Generating embeddings...")
courses = conn.execute("SELECT id, title, description FROM courses").fetchall()
for course in courses:
text = f"{course['title']}: {course['description']}"
embedding = embedder.encode(text).astype(np.float32)
cursor.execute(
"INSERT INTO course_embeddings (course_id, embedding) VALUES (?, ?)",
(course['id'], embedding.tobytes())
)
conn.commit()
print("Database created and populated successfully.")
print(f" - {cursor.execute('SELECT COUNT(*) FROM instructors').fetchone()[0]} instructors")
print(f" - {cursor.execute('SELECT COUNT(*) FROM courses').fetchone()[0]} courses")
print(f" - {cursor.execute('SELECT COUNT(*) FROM schedules').fetchone()[0]} schedules")
print(f" - {cursor.execute('SELECT COUNT(*) FROM course_embeddings').fetchone()[0]} embeddings")
return conn
conn = create_and_populate_db()def search_courses(conn, user_query, top_k=3):
"""
Embed the user query and find the most semantically similar courses.
Returns the top_k closest courses with full details.
"""
query_embedding = embedder.encode(user_query).astype(np.float32)
return conn.execute("""
SELECT
c.code,
c.title,
c.description,
c.credits,
c.level,
c.max_enrollment,
i.name AS instructor,
i.email AS instructor_email,
s.days,
s.time_start,
s.time_end,
s.room,
s.semester,
e.distance
FROM course_embeddings e
JOIN courses c ON e.course_id = c.id
JOIN instructors i ON c.instructor_id = i.id
JOIN schedules s ON s.course_id = c.id
WHERE e.embedding MATCH ?
AND k = ?
ORDER BY e.distance
""", (query_embedding.tobytes(), top_k)).fetchall()
def rows_to_text(rows):
if not rows:
return "No results found."
if isinstance(rows, sqlite3.Row):
rows = [rows]
return "\n".join(
" " + ", ".join(f"{k}: {row[k]}" for k in row.keys())
for row in rows
)
def build_context(conn, user_query):
rows = search_courses(conn, user_query)
return f"Most relevant courses for the query:\n{rows_to_text(rows)}"# Connect to the sqlite database
conn = create_and_populate_db()
# Query the database based on the user prompt
context = build_context(conn, USER_PROMPT)
# Create system prompt
SYSTEM_PROMPT = f"""
You help students lookup course information. Here are the course details:
---COURSE INFORMATION---
{context}
---END COURSE INFORMATION---
"""
# Query the model
response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": USER_PROMPT},
],
)
response.choices[0].message.contentFound existing database at '.data/course_catalog_rag.db'. Skipping creation.
'The course that specifically teaches about generative AI is:\n\n**CS-394: How Generative AI Works** \n- **Description**: Covers transformer architectures, large language models, prompt engineering, fine-tuning, and deployment. Includes hands-on work with open-source models. \n- **Details**: \n - Credits: 3 | Level: Advanced \n - Instructor: Dr. Sarah Chen | [Contact](mailto:s.chen@university.edu) \n - Time: Tue/Thu 14:00–15:20 | Room: Lab 12 \n - Semester: Spring 2026 \n\nThis course directly addresses generative AI concepts and techniques.\n'
Investigate text-to-sql.ipynb and rag.ipynb
Try different queries - e.g., ask for courses on particular days on in particular locations
https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro
https://huggingface.co/datasets/Idavidrein/gpqa
https://huggingface.co/datasets/cais/hle
{"messages": [{"role": "user", "content": "s = \"Hello\\nWorld\"\nprint(s)\n"}, {"role": "assistant", "content": "The code defines a string that contains the escape sequence \\n, which tells Python to start a new line when the string is displayed. When you run the print statement, \"Hello\" appears on one line and \"World\" appears on the next line, just like pressing Enter moves the cursor down in a text editor.\n\nImagine you are writing a story on a typewriter. When you finish a line, you press the \u201cEnter\u201d key to move to the next line and continue typing. The \\n character works the same way for strings in Python, signaling that the following text should start on a new line."}]}Synthetic data generation using the generate-synthetic.ipynb notebook
