TOON Format Guide
Token-Oriented Object Notation - a compact, human-readable format designed to minimize token usage in Large Language Model prompts.
What is TOON?
TOON (Token-Oriented Object Notation) is a data serialization format that optimizes structured data for Large Language Model consumption. It achieves significant token reduction by eliminating redundant syntax while maintaining human readability and machine parseability.
Key Principles
- Token Efficiency: Minimize token count without sacrificing clarity
- Human Readable: Easy for developers to write and review
- LLM Friendly: Structured format that models can reliably parse
- Bidirectional: Converts to/from JSON without data loss
Core Syntax Rules
1. Key-Value Pairs
Simple key-value pairs use a colon separator with no quotes around keys:
JSON
{
"name": "Alice",
"age": 30,
"city": "New York"
}
TOON
name: Alice
age: 30
city: New York
Token Savings
This example saves approximately 8 tokens (40% reduction) by removing quotes and braces.
2. Indentation for Nesting
TOON uses YAML-style indentation (typically 2 or 4 spaces) to represent nested structures:
JSON
{
"user": {
"profile": {
"name": "Alice",
"email": "alice@example.com"
}
}
}
TOON
user:
profile:
name: Alice
email: alice@example.com
Supported Data Types
| Type | JSON Example | TOON Example |
|---|---|---|
| String | "hello" |
hello |
| Number | 42 or 3.14 |
42 or 3.14 |
| Boolean | true / false |
true / false |
| Null | null |
(empty) |
| String with special chars | "hello, world" |
"hello, world" |
💡 Quoting Rules
Strings only need quotes if they contain: delimiters, newlines, leading/trailing whitespace, or colons. Simple strings are unquoted.
Arrays & Tabular Data
Uniform Object Arrays (The Power of TOON)
This is where TOON shines. Uniform arrays (objects with identical keys) use a CSV-style table format:
JSON (124 characters)
[
{
"id": 1,
"name": "Alice",
"role": "Developer"
},
{
"id": 2,
"name": "Bob",
"role": "Designer"
}
]
TOON (52 characters)
root[2]{id,name,role}:
1,Alice,Developer
2,Bob,Designer
🎯 Result
58% character reduction! This translates to approximately 20-25 fewer tokens for just 2 records. The savings scale with dataset size.
Syntax Breakdown
arrayName[length]{field1,field2,field3}:
value1,value2,value3
value1,value2,value3
arrayName- The key for this array[length]- Optional length marker (e.g.,[100]){field1,field2}- Column headers (field names):- Header terminator- Each subsequent line is a data row with comma-separated values
Non-Uniform Arrays
For arrays with mixed types or varying structures, TOON uses list notation:
items:
- Apple
- 42
- true
- name: Complex Object
value: 100
Nested Objects
Complex nested structures combine indentation with table notation:
company:
name: TechCorp
employees[3]{id,name,department}:
1,Alice,Engineering
2,Bob,Design
3,Charlie,Sales
metadata:
founded: 2020
location: San Francisco
Delimiter Options
TOON supports multiple delimiters for tabular data:
Comma (default)
1,Alice,Dev
Tab
1 Alice Dev
Pipe
1|Alice|Dev
💡 Choosing a Delimiter
- • Comma: Best for most use cases
- • Tab: Good when data contains many commas
- • Pipe: Useful for data with commas and tabs
Length Markers
Length markers indicate the number of items in an array, helping LLMs validate data completeness:
Without Length Marker
users{id,name}:
1,Alice
2,Bob
3,Charlie
With Length Marker
users[3]{id,name}:
1,Alice
2,Bob
3,Charlie
Benefits
- Data Validation: LLMs can verify they received complete data
- Immediate Context: Structure visible at a glance
- Counting Queries: Improves accuracy for "how many" questions
⚠️ Token Cost
Length markers add 1-3 tokens per array depending on the number's size. For most use cases, this is a worthwhile tradeoff for improved reliability.
Special Cases & Edge Handling
Empty Values
users[2]{id,name,email}:
1,Alice,alice@example.com
2,Bob,
Empty values are represented by omitting content between delimiters.
Special Characters in Values
Values containing delimiters, newlines, or leading/trailing spaces must be quoted:
products[2]{id,description}:
1,"High-quality, durable product"
2," Premium item "
Escaping Quotes
Double quotes inside quoted strings are escaped by doubling them:
messages[1]{id,text}:
1,"She said ""Hello"" to me"
Mixed Nested Arrays
When arrays contain both uniform objects and other types:
data:
- items[2]{id,name}:
1,Apple
2,Banana
- tags:
- fruit
- organic
- metadata:
count: 2
updated: 2025-01-15
Best Practices
DO: Normalize Data Before Conversion
Ensure all objects in an array have the same keys. Add null or empty string for missing fields to maintain uniformity.
// Good - All objects have same keys users[2]{id,name,email}:
1,Alice,alice@example.com
2,Bob,
DO: Use Consistent Indentation
Stick with either 2 or 4 spaces throughout your TOON document. Mixing indentation styles reduces readability.
DO: Add Context in LLM Prompts
When sending TOON to an LLM, include a brief description: "Here's user data in TOON format:" This helps models parse correctly.
DON'T: Force Non-Uniform Data into Tables
If your objects have wildly different structures, use list notation or stick with JSON. TOON's table format requires consistency.
DON'T: Use for Deeply Nested Structures (5+ levels)
TOON's indentation becomes hard to read with deep nesting. JSON handles complex hierarchies better.
DON'T: Use for Tiny Datasets
For single objects or 1-3 item arrays, token savings are negligible. TOON shines with 10+ uniform records.
Real-World Examples
Example 1: E-commerce Product Catalog
JSON (312 chars, ~78 tokens)
[{
"sku": "PROD-001",
"name": "Laptop",
"price": 1299.99,
"stock": 45,
"category": "Electronics"
},
{
"sku": "PROD-002",
"name": "Mouse",
"price": 29.99,
"stock": 150,
"category": "Accessories"
},
{
"sku": "PROD-003",
"name": "Keyboard",
"price": 89.99,
"stock": 78,
"category": "Accessories"
}
]
TOON (141 chars, ~35 tokens)
root[3]{sku,name,price,stock,category}:
PROD-001,Laptop,1299.99,45,Electronics
PROD-002,Mouse,29.99,150,Accessories
PROD-003,Keyboard,89.99,78,Accessories
Savings: 55% reduction (~43 tokens saved)
Example 2: Analytics Dashboard Data
JSON
{"dashboard": {
"metrics": [
{
"name": "Page Views",
"current": 45230,
"previous": 42180,
"change": 7.2
},
{
"name": "Unique Visitors",
"current": 12450,
"previous": 13100,
"change": -5.0
}
],
"updated": "2025-01-15"
}
}
TOON
dashboard:
metrics[2]{name,current,previous,change}:
"Page Views",45230,42180,7.2
"Unique Visitors",12450,13100,-5.0
updated: 2025-01-15
Savings: 48% reduction
Example 3: User Activity Log
activity_log:
user:
id: 12345
email: alice@example.com
events[4]{timestamp,action,page,duration}:
2025-01-15T10:30:00,view,/dashboard,45
2025-01-15T10:31:00,click,/products,12
2025-01-15T10:32:00,search,/search,8
2025-01-15T10:33:00,purchase,/checkout,120
session:
ip: 192.168.1.1
device: desktop
browser: Chrome
Ready to optimize your LLM prompts?
Use our free JSON to TOON converter to see immediate token savings on your data.
Try the Converter →