TOON Format Guide - Token-Oriented Object Notation

What is TOON?

TOON (Token-Oriented Object Notation) is a data serialization format that optimizes structured data for Large Language Model consumption. It achieves significant token reduction by eliminating redundant syntax while maintaining human readability and machine parseability.

Key Principles

Token Efficiency: Minimize token count without sacrificing clarity
Human Readable: Easy for developers to write and review
LLM Friendly: Structured format that models can reliably parse
Bidirectional: Converts to/from JSON without data loss

Core Syntax Rules

1. Key-Value Pairs

Simple key-value pairs use a colon separator with no quotes around keys:

JSON

{
  "name": "Alice",
  "age": 30,
  "city": "New York"
}

TOON

name: Alice
age: 30
city: New York

Token Savings

This example saves approximately 8 tokens (40% reduction) by removing quotes and braces.

2. Indentation for Nesting

TOON uses YAML-style indentation (typically 2 or 4 spaces) to represent nested structures:

JSON

{
  "user": {
    "profile": {
      "name": "Alice",
      "email": "alice@example.com"
    }
  }
}

TOON

user:
    profile:
        name: Alice
        email: alice@example.com

Supported Data Types

Type	JSON Example	TOON Example
String	`"hello"`	`hello`
Number	`42` or `3.14`	`42` or `3.14`
Boolean	`true` / `false`	`true` / `false`
Null	`null`	(empty)
String with special chars	`"hello, world"`	`"hello, world"`

💡 Quoting Rules

Strings only need quotes if they contain: delimiters, newlines, leading/trailing whitespace, or colons. Simple strings are unquoted.

Arrays & Tabular Data

Uniform Object Arrays (The Power of TOON)

This is where TOON shines. Uniform arrays (objects with identical keys) use a CSV-style table format:

JSON (124 characters)

[
  {
    "id": 1,
    "name": "Alice",
    "role": "Developer"
  },
  {
    "id": 2,
    "name": "Bob",
    "role": "Designer"
  }
]

TOON (52 characters)

root[2]{id,name,role}:
    1,Alice,Developer
    2,Bob,Designer

🎯 Result

58% character reduction! This translates to approximately 20-25 fewer tokens for just 2 records. The savings scale with dataset size.

Syntax Breakdown

arrayName[length]{field1,field2,field3}:
    value1,value2,value3
    value1,value2,value3

arrayName - The key for this array
[length] - Optional length marker (e.g., [100])
{field1,field2} - Column headers (field names)
: - Header terminator
Each subsequent line is a data row with comma-separated values

Non-Uniform Arrays

For arrays with mixed types or varying structures, TOON uses list notation:

items:
    - Apple
    - 42
    - true
    - name: Complex Object
      value: 100

Nested Objects

Complex nested structures combine indentation with table notation:

company:
    name: TechCorp
    employees[3]{id,name,department}:
        1,Alice,Engineering
        2,Bob,Design
        3,Charlie,Sales
    metadata:
        founded: 2020
        location: San Francisco

Delimiter Options

TOON supports multiple delimiters for tabular data:

Comma (default)

1,Alice,Dev

Tab

1	Alice	Dev

Pipe

1|Alice|Dev

💡 Choosing a Delimiter

• Comma: Best for most use cases
• Tab: Good when data contains many commas
• Pipe: Useful for data with commas and tabs

Length Markers

Length markers indicate the number of items in an array, helping LLMs validate data completeness:

Without Length Marker

users{id,name}:
1,Alice
2,Bob
3,Charlie

With Length Marker

users[3]{id,name}:
1,Alice
2,Bob
3,Charlie

Benefits

Data Validation: LLMs can verify they received complete data
Immediate Context: Structure visible at a glance
Counting Queries: Improves accuracy for "how many" questions

⚠️ Token Cost

Length markers add 1-3 tokens per array depending on the number's size. For most use cases, this is a worthwhile tradeoff for improved reliability.

Special Cases & Edge Handling

Empty Values

users[2]{id,name,email}:
1,Alice,alice@example.com
2,Bob,

Empty values are represented by omitting content between delimiters.

Special Characters in Values

Values containing delimiters, newlines, or leading/trailing spaces must be quoted:

products[2]{id,description}:
1,"High-quality, durable product"
2,"  Premium item  "

Escaping Quotes

Double quotes inside quoted strings are escaped by doubling them:

messages[1]{id,text}:
1,"She said ""Hello"" to me"

Mixed Nested Arrays

When arrays contain both uniform objects and other types:

data:
- items[2]{id,name}:
    1,Apple
    2,Banana
- tags:
    - fruit
    - organic
- metadata:
    count: 2
    updated: 2025-01-15

Best Practices

DO: Normalize Data Before Conversion

Ensure all objects in an array have the same keys. Add null or empty string for missing fields to maintain uniformity.

// Good - All objects have same keys users[2]{id,name,email}:
1,Alice,alice@example.com
2,Bob,

DO: Use Consistent Indentation

Stick with either 2 or 4 spaces throughout your TOON document. Mixing indentation styles reduces readability.

DO: Add Context in LLM Prompts

When sending TOON to an LLM, include a brief description: "Here's user data in TOON format:" This helps models parse correctly.

DON'T: Force Non-Uniform Data into Tables

If your objects have wildly different structures, use list notation or stick with JSON. TOON's table format requires consistency.

DON'T: Use for Deeply Nested Structures (5+ levels)

TOON's indentation becomes hard to read with deep nesting. JSON handles complex hierarchies better.

DON'T: Use for Tiny Datasets

For single objects or 1-3 item arrays, token savings are negligible. TOON shines with 10+ uniform records.

Real-World Examples

Example 1: E-commerce Product Catalog

JSON (312 chars, ~78 tokens)

[{
"sku": "PROD-001",
"name": "Laptop",
"price": 1299.99,
"stock": 45,
"category": "Electronics"
},
{
"sku": "PROD-002",
"name": "Mouse",
"price": 29.99,
"stock": 150,
"category": "Accessories"
},
{
"sku": "PROD-003",
"name": "Keyboard",
"price": 89.99,
"stock": 78,
"category": "Accessories"
}
]

TOON (141 chars, ~35 tokens)

root[3]{sku,name,price,stock,category}:
PROD-001,Laptop,1299.99,45,Electronics
PROD-002,Mouse,29.99,150,Accessories
PROD-003,Keyboard,89.99,78,Accessories

Savings: 55% reduction (~43 tokens saved)

Example 2: Analytics Dashboard Data

JSON

{"dashboard": {
"metrics": [
{
"name": "Page Views",
"current": 45230,
"previous": 42180,
"change": 7.2
},
{
"name": "Unique Visitors",
"current": 12450,
"previous": 13100,
"change": -5.0
}
],
"updated": "2025-01-15"
}
}

TOON

dashboard:
metrics[2]{name,current,previous,change}:
"Page Views",45230,42180,7.2
"Unique Visitors",12450,13100,-5.0
updated: 2025-01-15

Savings: 48% reduction

Example 3: User Activity Log

activity_log:
user:
    id: 12345
    email: alice@example.com
events[4]{timestamp,action,page,duration}:
    2025-01-15T10:30:00,view,/dashboard,45
    2025-01-15T10:31:00,click,/products,12
    2025-01-15T10:32:00,search,/search,8
    2025-01-15T10:33:00,purchase,/checkout,120
session:
    ip: 192.168.1.1
    device: desktop
    browser: Chrome

What is TOON?

Key Principles

Core Syntax Rules

1. Key-Value Pairs

2. Indentation for Nesting

Supported Data Types

Arrays & Tabular Data

Uniform Object Arrays (The Power of TOON)

Syntax Breakdown

Non-Uniform Arrays

Nested Objects

Delimiter Options

Comma (default)

Tab

Pipe

Length Markers

Benefits

Special Cases & Edge Handling

Empty Values

Special Characters in Values

Escaping Quotes

Mixed Nested Arrays

Best Practices

DO: Normalize Data Before Conversion

DO: Use Consistent Indentation

DO: Add Context in LLM Prompts

DON'T: Force Non-Uniform Data into Tables

DON'T: Use for Deeply Nested Structures (5+ levels)

DON'T: Use for Tiny Datasets

Real-World Examples

Example 1: E-commerce Product Catalog

Example 2: Analytics Dashboard Data

Example 3: User Activity Log

Ready to optimize your LLM prompts?