TOON Format Guide

Token-Oriented Object Notation - a compact, human-readable format designed to minimize token usage in Large Language Model prompts.

Version 1.0 30-60% Token Savings LLM Optimized

What is TOON?

TOON (Token-Oriented Object Notation) is a data serialization format that optimizes structured data for Large Language Model consumption. It achieves significant token reduction by eliminating redundant syntax while maintaining human readability and machine parseability.

Key Principles

  • Token Efficiency: Minimize token count without sacrificing clarity
  • Human Readable: Easy for developers to write and review
  • LLM Friendly: Structured format that models can reliably parse
  • Bidirectional: Converts to/from JSON without data loss

Core Syntax Rules

1. Key-Value Pairs

Simple key-value pairs use a colon separator with no quotes around keys:

JSON

{
  "name": "Alice",
  "age": 30,
  "city": "New York"
}

TOON

name: Alice
age: 30
city: New York

Token Savings

This example saves approximately 8 tokens (40% reduction) by removing quotes and braces.

2. Indentation for Nesting

TOON uses YAML-style indentation (typically 2 or 4 spaces) to represent nested structures:

JSON

{
  "user": {
    "profile": {
      "name": "Alice",
      "email": "alice@example.com"
    }
  }
}

TOON

user:
    profile:
        name: Alice
        email: alice@example.com

Supported Data Types

Type JSON Example TOON Example
String "hello" hello
Number 42 or 3.14 42 or 3.14
Boolean true / false true / false
Null null (empty)
String with special chars "hello, world" "hello, world"

💡 Quoting Rules

Strings only need quotes if they contain: delimiters, newlines, leading/trailing whitespace, or colons. Simple strings are unquoted.

Arrays & Tabular Data

Uniform Object Arrays (The Power of TOON)

This is where TOON shines. Uniform arrays (objects with identical keys) use a CSV-style table format:

JSON (124 characters)

[
  {
    "id": 1,
    "name": "Alice",
    "role": "Developer"
  },
  {
    "id": 2,
    "name": "Bob",
    "role": "Designer"
  }
]

TOON (52 characters)

root[2]{id,name,role}:
    1,Alice,Developer
    2,Bob,Designer

🎯 Result

58% character reduction! This translates to approximately 20-25 fewer tokens for just 2 records. The savings scale with dataset size.

Syntax Breakdown

arrayName[length]{field1,field2,field3}:
    value1,value2,value3
    value1,value2,value3
  • arrayName - The key for this array
  • [length] - Optional length marker (e.g., [100])
  • {field1,field2} - Column headers (field names)
  • : - Header terminator
  • Each subsequent line is a data row with comma-separated values

Non-Uniform Arrays

For arrays with mixed types or varying structures, TOON uses list notation:

items:
    - Apple
    - 42
    - true
    - name: Complex Object
      value: 100

Nested Objects

Complex nested structures combine indentation with table notation:

company:
    name: TechCorp
    employees[3]{id,name,department}:
        1,Alice,Engineering
        2,Bob,Design
        3,Charlie,Sales
    metadata:
        founded: 2020
        location: San Francisco

Delimiter Options

TOON supports multiple delimiters for tabular data:

Comma (default)

1,Alice,Dev

Tab

1	Alice	Dev

Pipe

1|Alice|Dev

💡 Choosing a Delimiter

  • Comma: Best for most use cases
  • Tab: Good when data contains many commas
  • Pipe: Useful for data with commas and tabs

Length Markers

Length markers indicate the number of items in an array, helping LLMs validate data completeness:

Without Length Marker

users{id,name}:
1,Alice
2,Bob
3,Charlie

With Length Marker

users[3]{id,name}:
1,Alice
2,Bob
3,Charlie

Benefits

  • Data Validation: LLMs can verify they received complete data
  • Immediate Context: Structure visible at a glance
  • Counting Queries: Improves accuracy for "how many" questions

⚠️ Token Cost

Length markers add 1-3 tokens per array depending on the number's size. For most use cases, this is a worthwhile tradeoff for improved reliability.

Special Cases & Edge Handling

Empty Values

users[2]{id,name,email}:
1,Alice,alice@example.com
2,Bob,

Empty values are represented by omitting content between delimiters.

Special Characters in Values

Values containing delimiters, newlines, or leading/trailing spaces must be quoted:

products[2]{id,description}:
1,"High-quality, durable product"
2,"  Premium item  "

Escaping Quotes

Double quotes inside quoted strings are escaped by doubling them:

messages[1]{id,text}:
1,"She said ""Hello"" to me"

Mixed Nested Arrays

When arrays contain both uniform objects and other types:

data:
- items[2]{id,name}:
    1,Apple
    2,Banana
- tags:
    - fruit
    - organic
- metadata:
    count: 2
    updated: 2025-01-15

Best Practices

DO: Normalize Data Before Conversion

Ensure all objects in an array have the same keys. Add null or empty string for missing fields to maintain uniformity.

// Good - All objects have same keys users[2]{id,name,email}:
1,Alice,alice@example.com
2,Bob,

DO: Use Consistent Indentation

Stick with either 2 or 4 spaces throughout your TOON document. Mixing indentation styles reduces readability.

DO: Add Context in LLM Prompts

When sending TOON to an LLM, include a brief description: "Here's user data in TOON format:" This helps models parse correctly.

DON'T: Force Non-Uniform Data into Tables

If your objects have wildly different structures, use list notation or stick with JSON. TOON's table format requires consistency.

DON'T: Use for Deeply Nested Structures (5+ levels)

TOON's indentation becomes hard to read with deep nesting. JSON handles complex hierarchies better.

DON'T: Use for Tiny Datasets

For single objects or 1-3 item arrays, token savings are negligible. TOON shines with 10+ uniform records.

Real-World Examples

Example 1: E-commerce Product Catalog

JSON (312 chars, ~78 tokens)

[{
"sku": "PROD-001",
"name": "Laptop",
"price": 1299.99,
"stock": 45,
"category": "Electronics"
},
{
"sku": "PROD-002",
"name": "Mouse",
"price": 29.99,
"stock": 150,
"category": "Accessories"
},
{
"sku": "PROD-003",
"name": "Keyboard",
"price": 89.99,
"stock": 78,
"category": "Accessories"
}
]

TOON (141 chars, ~35 tokens)

root[3]{sku,name,price,stock,category}:
PROD-001,Laptop,1299.99,45,Electronics
PROD-002,Mouse,29.99,150,Accessories
PROD-003,Keyboard,89.99,78,Accessories

Savings: 55% reduction (~43 tokens saved)

Example 2: Analytics Dashboard Data

JSON

{"dashboard": {
"metrics": [
{
"name": "Page Views",
"current": 45230,
"previous": 42180,
"change": 7.2
},
{
"name": "Unique Visitors",
"current": 12450,
"previous": 13100,
"change": -5.0
}
],
"updated": "2025-01-15"
}
}

TOON

dashboard:
metrics[2]{name,current,previous,change}:
"Page Views",45230,42180,7.2
"Unique Visitors",12450,13100,-5.0
updated: 2025-01-15

Savings: 48% reduction

Example 3: User Activity Log

activity_log:
user:
    id: 12345
    email: alice@example.com
events[4]{timestamp,action,page,duration}:
    2025-01-15T10:30:00,view,/dashboard,45
    2025-01-15T10:31:00,click,/products,12
    2025-01-15T10:32:00,search,/search,8
    2025-01-15T10:33:00,purchase,/checkout,120
session:
    ip: 192.168.1.1
    device: desktop
    browser: Chrome

Ready to optimize your LLM prompts?

Use our free JSON to TOON converter to see immediate token savings on your data.

Try the Converter →