-
Notifications
You must be signed in to change notification settings - Fork 66
Description
This ticket is for adding TOON support for ZIO Schema 2, as a new Format, with associated codec, deriver, test, and documentation.
NOTE: What follows is an AI-generated description of the problem and sketch of solution--it may be useful, but it certainly contains errors, and if you don't know enough to find and fix those errors, you shouldn't attempt to complete this ticket.
TOON Format Implementation Guide for ZIO Schema 2
Executive Summary
This guide provides a complete specification for implementing TOON (Token-Oriented Object Notation) codec support in ZIO Schema 2 (zio-blocks). TOON is a compact, human-readable serialization format designed to minimize token usage when passing structured data to Large Language Models, achieving 30-60% token reduction compared to JSON while maintaining lossless bidirectional conversion.
The implementation will follow the established patterns in zio-blocks, mirroring the architecture of JsonBinaryCodecDeriver while adding TOON-specific capabilities for array format selection and indentation-based structure.
Part 1: TOON Format Specification
1.1 Overview
TOON was created by Johann Schopplich in 2025 to address the inefficiency of JSON when used in LLM prompts. The format combines YAML-style indentation with CSV-style tabular data representation. The specification is maintained at github.com/toon-format/spec, currently at version 3.0.
Design goals:
- Minimize token count for LLM context windows
- Maintain human readability
- Enable lossless JSON↔TOON conversion
- Schema-aware encoding for maximum compression
1.2 Data Types
TOON supports the complete JSON data model:
| Type | TOON Representation | Example |
|---|---|---|
| String | Unquoted (default) or quoted | hello or "hello, world" |
| Number | Decimal form only (no scientific notation) | 42, 3.14159 |
| Boolean | Lowercase keywords | true, false |
| Null | Keyword | null |
| Array | Three formats (see §1.4) | items[3]: a,b,c |
| Object | Indentation-based nesting | See §1.3 |
Executive Summary
This guide provides a complete specification for implementing TOON (Token-Oriented Object Notation) codec support in ZIO Schema 2 (zio-blocks). TOON is a compact, human-readable serialization format designed to minimize token usage when passing structured data to Large Language Models, achieving 30-60% token reduction compared to JSON while maintaining lossless bidirectional conversion.
The implementation will follow the established patterns in zio-blocks, mirroring the architecture of JsonBinaryCodecDeriver while adding TOON-specific capabilities for array format selection and indentation-based structure.
Part 1: TOON Format Specification
1.1 Overview
TOON was created by Johann Schopplich in 2025 to address the inefficiency of JSON when used in LLM prompts. The format combines YAML-style indentation with CSV-style tabular data representation. The specification is maintained at [github.com/toon-format/spec](https://github.com/toon-format/spec), currently at version 3.0.
Design goals:
- Minimize token count for LLM context windows
- Maintain human readability
- Enable lossless JSON↔TOON conversion
- Schema-aware encoding for maximum compression
1.2 Data Types
TOON supports the complete JSON data model:
| Type | TOON Representation | Example |
|---|---|---|
| String | Unquoted (default) or quoted | hello or "hello, world" |
| Number | Decimal form only (no scientific notation) | 42, 3.14159 |
| Boolean | Lowercase keywords | true, false |
| Null | Keyword | null |
| Array | Three formats (see §1.4) | items[3]: a,b,c |
| Object | Indentation-based nesting | See §1.3 |
1.3 Object Encoding
Objects use indentation (2 spaces default) with colon-separated key-value pairs:
name: Alice
age: 30
address:
street: 123 Main St
city: Springfield
Equivalent JSON:
{"name":"Alice","age":30,"address":{"street":"123 Main St","city":"Springfield"}}Key rules:
- Keys are unquoted unless they contain special characters
- Values on the same line as keys (primitives) or indented below (nested structures)
- Empty objects: just the key with colon and nothing following
1.4 Array Encoding Formats
TOON's primary innovation is intelligent array encoding. The format supports three array representations:
Tabular Format (Maximum Compression)
For arrays of uniform objects where all elements share identical keys with only primitive values:
users[3]{id,name,email}:
1,Alice,[email protected]
2,Bob,[email protected]
3,Carol,[email protected]
Equivalent JSON:
{"users":[{"id":1,"name":"Alice","email":"[email protected]"},{"id":2,"name":"Bob","email":"[email protected]"},{"id":3,"name":"Carol","email":"[email protected]"}]}Tabular eligibility requirements:
- All elements must be objects
- All objects must have identical keys in the same order
- All field values must be primitives (not nested objects or arrays)
Inline Format (Primitive Arrays)
For arrays containing only primitive values:
tags[4]: javascript,react,typescript,node
numbers[5]: 1,2,3,4,5
List Format (Heterogeneous Data)
For arrays with mixed types, nested structures, or non-uniform objects:
items[3]:
- name: Widget
price: 9.99
- name: Gadget
price: 19.99
- simple string value
1.5 String Quoting Rules
Strings are unquoted by default. Quotes are required only when the string contains:
- The active delimiter (comma by default)
- A colon
: - Leading or trailing whitespace
- Control characters
- The characters
{,},[,]
Escape sequences (only these five are valid):
\\→ backslash\"→ double quote\n→ newline\r→ carriage return\t→ tab
1.6 Number Formatting
TOON requires decimal form without scientific notation:
| Value | JSON | TOON |
|---|---|---|
| 15 billion | 1.5e10 |
15000000000 |
| Tiny | 1e-10 |
0.0000000001 |
| NaN | N/A | null |
| Infinity | N/A | null |
| -0 | -0 |
0 |
1.7 Key Folding (Optional)
Chains of single-key wrapper objects can be collapsed:
user.profile.settings.theme: dark
Equivalent to:
user:
profile:
settings:
theme: dark
Part 2: ZIO Schema 2 Architecture
2.1 Core Abstractions
ZIO Schema 2 uses a deriver-based architecture where format codecs are derived from Schema[A] definitions. The key components are:
// The schema definition
case class Person(name: String, age: Int)
object Person {
implicit val schema: Schema[Person] = Schema.derived
}
// Deriving a codec
val jsonCodec: JsonBinaryCodec[Person] = Schema[Person].derive(JsonFormat.deriver)2.2 Deriver Trait
The Deriver[TC[_]] trait defines how to derive type class instances for different schema shapes:
trait Deriver[TC[_]] {
def derivePrimitive[F[_, _], A](
primitiveType: PrimitiveType[A],
typeName: TypeName[A],
binding: Binding[BindingType.Primitive, A],
doc: Doc,
modifiers: Seq[Modifier.Reflect]
): Lazy[TC[A]]
def deriveRecord[F[_, _], A](
fields: IndexedSeq[Term[F, A, ?]],
typeName: TypeName[A],
binding: Binding[BindingType.Record, A],
doc: Doc,
modifiers: Seq[Modifier.Reflect]
)(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[TC[A]]
def deriveVariant[F[_, _], A](
cases: IndexedSeq[Term[F, A, ?]],
typeName: TypeName[A],
binding: Binding[BindingType.Variant, A],
doc: Doc,
modifiers: Seq[Modifier.Reflect]
)(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[TC[A]]
def deriveSequence[F[_, _], C[_], A](
element: Reflect[F, A],
typeName: TypeName[C[A]],
binding: Binding[BindingType.Seq[C], C[A]],
doc: Doc,
modifiers: Seq[Modifier.Reflect]
)(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[TC[C[A]]]
def deriveMap[F[_, _], M[_, _], K, V](
key: Reflect[F, K],
value: Reflect[F, V],
typeName: TypeName[M[K, V]],
binding: Binding[BindingType.Map[M], M[K, V]],
doc: Doc,
modifiers: Seq[Modifier.Reflect]
)(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[TC[M[K, V]]]
def deriveDynamic[F[_, _]](
binding: Binding[BindingType.Dynamic, DynamicValue],
doc: Doc,
modifiers: Seq[Modifier.Reflect]
)(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[TC[DynamicValue]]
def deriveWrapper[F[_, _], A, B](
wrapped: Reflect[F, B],
typeName: TypeName[A],
wrapperPrimitiveType: Option[PrimitiveType[A]],
binding: Binding[BindingType.Wrapper[A, B], A],
doc: Doc,
modifiers: Seq[Modifier.Reflect]
)(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[TC[A]]
}2.3 BinaryCodec Pattern
Codecs extend BinaryCodec[A] and work with streaming readers/writers:
abstract class JsonBinaryCodec[A](val valueType: Int = JsonBinaryCodec.objectType)
extends BinaryCodec[A] {
// Core methods to implement
def decodeValue(in: JsonReader, default: A): A
def encodeValue(x: A, out: JsonWriter): Unit
// Optional key encoding (for map keys)
def decodeKey(in: JsonReader): A
def encodeKey(x: A, out: JsonWriter): Unit
// Null value for initialization
def nullValue: A = null.asInstanceOf[A]
// Public API
def decode(input: ByteBuffer, config: ReaderConfig): Either[SchemaError, A]
def encode(value: A, output: ByteBuffer, config: WriterConfig): Unit
}2.4 Configuration Architecture
Configuration is split between two concerns:
Semantic configuration lives on the deriver class itself:
class JsonBinaryCodecDeriver(
fieldNameMapper: NameMapper, // Field name transformation
caseNameMapper: NameMapper, // Case/variant name transformation
discriminatorKind: DiscriminatorKind, // ADT encoding strategy
rejectExtraFields: Boolean, // Fail on unknown fields
enumValuesAsStrings: Boolean, // Enum encoding style
transientNone: Boolean, // Omit None values
requireOptionFields: Boolean, // Require Option fields
transientEmptyCollection: Boolean, // Omit empty collections
requireCollectionFields: Boolean, // Require collection fields
transientDefaultValue: Boolean, // Omit default-valued fields
requireDefaultValueFields: Boolean // Require fields with defaults
) extends Deriver[JsonBinaryCodec]Runtime configuration lives in separate config classes:
// ReaderConfig: buffer sizes and parsing behavior
class ReaderConfig(
val preferredBufSize: Int, // Default: 32768
val preferredCharBufSize: Int, // Default: 4096
val maxBufSize: Int, // Default: 33554432
val maxCharBufSize: Int, // Default: 4194304
val checkForEndOfInput: Boolean // Default: true
)
// WriterConfig: output formatting
class WriterConfig(
val indentionStep: Int, // Default: 0 (compact)
val preferredBufSize: Int, // Default: 32768
val escapeUnicode: Boolean // Default: false
)2.5 DiscriminatorKind for ADTs
Sum types (sealed traits) support three encoding strategies:
sealed trait DiscriminatorKind
object DiscriminatorKind {
// Wrapper object: {"Cat": {"name": "Whiskers"}}
case object Key extends DiscriminatorKind // DEFAULT
// Embedded field: {"type": "Cat", "name": "Whiskers"}
case class Field(name: String) extends DiscriminatorKind
// No discriminator: try each case sequentially
case object None extends DiscriminatorKind
}2.6 NameMapper for Field Transformation
sealed trait NameMapper extends (String => String)
object NameMapper {
case object Identity extends NameMapper // No transformation (default)
case object SnakeCase extends NameMapper // memberName → member_name
case object CamelCase extends NameMapper // member_name → memberName
case object PascalCase extends NameMapper // member_name → MemberName
case object KebabCase extends NameMapper // memberName → member-name
case class Custom(f: String => String) extends NameMapper
}2.7 Modifier System
ZIO Schema 2 uses Modifier classes (not Java annotations) for customization:
// Rename a field or case
@Modifier.rename("new_name")
case class Example(field: String)
// Add decoding aliases
@Modifier.alias("old_name")
case object Blue extends Color
// Mark field as transient (excluded from serialization)
@Modifier.transient()
val internalField: Int = 0Programmatic application:
val codec = Color.schema
.deriving(JsonBinaryCodecDeriver)
.modifier(Color.red, Modifier.rename("Rose"))
.modifier(Color.red, Modifier.alias("Ruby"))
.derivePart 3: TOON Implementation Design
3.1 Module Structure
zio-blocks/
└── schema-toon/
└── src/main/scala/zio/blocks/schema/toon/
├── ToonFormat.scala # Format definition object
├── ToonBinaryCodec.scala # Abstract codec class
├── ToonBinaryCodecDeriver.scala # Deriver implementation
├── ToonReader.scala # Streaming parser
├── ToonWriter.scala # Streaming serializer
├── ReaderConfig.scala # Parser configuration
├── WriterConfig.scala # Serializer configuration
├── ArrayFormat.scala # TOON-specific array encoding
└── DiscriminatorKind.scala # Reuse or extend from JSON
3.2 ToonFormat Object
package zio.blocks.schema.toon
import zio.blocks.schema.codec.BinaryFormat
/**
* The TOON format for ZIO Schema 2.
*
* TOON (Token-Oriented Object Notation) is a compact serialization format
* optimized for LLM token efficiency, achieving 30-60% reduction vs JSON.
*/
object ToonFormat extends BinaryFormat("application/toon", ToonBinaryCodecDeriver)3.3 ArrayFormat Enum
package zio.blocks.schema.toon
/**
* Specifies how arrays should be encoded in TOON format.
*/
sealed trait ArrayFormat
object ArrayFormat {
/**
* Automatically select the most compact format based on array contents:
* - Tabular for uniform object arrays with primitive fields
* - Inline for primitive arrays
* - List for heterogeneous or nested data
*/
case object Auto extends ArrayFormat
/**
* Force tabular format: `items[N]{field1,field2}: val1,val2`
* Falls back to List if array is not tabular-eligible.
*/
case object Tabular extends ArrayFormat
/**
* Force inline format: `items[N]: val1,val2,val3`
* Only valid for primitive arrays.
*/
case object Inline extends ArrayFormat
/**
* Force list format with `- ` markers.
*/
case object List extends ArrayFormat
}3.4 ToonBinaryCodecDeriver
package zio.blocks.schema.toon
import zio.blocks.schema._
import zio.blocks.schema.binding._
import zio.blocks.schema.codec.BinaryFormat
import zio.blocks.schema.derive._
import zio.blocks.schema.json.{DiscriminatorKind, NameMapper}
/**
* Default TOON deriver with standard settings.
*/
object ToonBinaryCodecDeriver extends ToonBinaryCodecDeriver(
fieldNameMapper = NameMapper.Identity,
caseNameMapper = NameMapper.Identity,
discriminatorKind = DiscriminatorKind.Key,
arrayFormat = ArrayFormat.Auto,
delimiter = ',',
rejectExtraFields = false,
enumValuesAsStrings = true,
transientNone = true,
requireOptionFields = false,
transientEmptyCollection = true,
requireCollectionFields = false,
transientDefaultValue = true,
requireDefaultValueFields = false,
enableKeyFolding = false
)
/**
* Deriver for TOON binary codecs with configurable behavior.
*
* @param fieldNameMapper Transform strategy for field names
* @param caseNameMapper Transform strategy for variant case names
* @param discriminatorKind ADT encoding strategy (Key, Field, None)
* @param arrayFormat Array encoding preference (Auto, Tabular, Inline, List)
* @param delimiter Value separator in tabular/inline arrays (comma default)
* @param rejectExtraFields Fail decoding on unrecognized fields
* @param enumValuesAsStrings Encode case object enums as strings
* @param transientNone Omit None-valued Option fields
* @param requireOptionFields Require Option fields to be present
* @param transientEmptyCollection Omit empty collection fields
* @param requireCollectionFields Require collection fields to be present
* @param transientDefaultValue Omit fields matching their default value
* @param requireDefaultValueFields Require fields with defaults to be present
* @param enableKeyFolding Enable dotted key path expansion
*/
class ToonBinaryCodecDeriver private[toon] (
fieldNameMapper: NameMapper,
caseNameMapper: NameMapper,
discriminatorKind: DiscriminatorKind,
arrayFormat: ArrayFormat,
delimiter: Char,
rejectExtraFields: Boolean,
enumValuesAsStrings: Boolean,
transientNone: Boolean,
requireOptionFields: Boolean,
transientEmptyCollection: Boolean,
requireCollectionFields: Boolean,
transientDefaultValue: Boolean,
requireDefaultValueFields: Boolean,
enableKeyFolding: Boolean
) extends Deriver[ToonBinaryCodec] {
// Builder methods
def withFieldNameMapper(mapper: NameMapper): ToonBinaryCodecDeriver =
copy(fieldNameMapper = mapper)
def withCaseNameMapper(mapper: NameMapper): ToonBinaryCodecDeriver =
copy(caseNameMapper = mapper)
def withDiscriminatorKind(kind: DiscriminatorKind): ToonBinaryCodecDeriver =
copy(discriminatorKind = kind)
def withArrayFormat(format: ArrayFormat): ToonBinaryCodecDeriver =
copy(arrayFormat = format)
def withDelimiter(delim: Char): ToonBinaryCodecDeriver =
copy(delimiter = delim)
def withRejectExtraFields(reject: Boolean): ToonBinaryCodecDeriver =
copy(rejectExtraFields = reject)
def withEnumValuesAsStrings(asStrings: Boolean): ToonBinaryCodecDeriver =
copy(enumValuesAsStrings = asStrings)
def withTransientNone(transient: Boolean): ToonBinaryCodecDeriver =
copy(transientNone = transient)
def withKeyFolding(enabled: Boolean): ToonBinaryCodecDeriver =
copy(enableKeyFolding = enabled)
// ... additional builder methods ...
private def copy(
fieldNameMapper: NameMapper = fieldNameMapper,
caseNameMapper: NameMapper = caseNameMapper,
discriminatorKind: DiscriminatorKind = discriminatorKind,
arrayFormat: ArrayFormat = arrayFormat,
delimiter: Char = delimiter,
rejectExtraFields: Boolean = rejectExtraFields,
enumValuesAsStrings: Boolean = enumValuesAsStrings,
transientNone: Boolean = transientNone,
requireOptionFields: Boolean = requireOptionFields,
transientEmptyCollection: Boolean = transientEmptyCollection,
requireCollectionFields: Boolean = requireCollectionFields,
transientDefaultValue: Boolean = transientDefaultValue,
requireDefaultValueFields: Boolean = requireDefaultValueFields,
enableKeyFolding: Boolean = enableKeyFolding
): ToonBinaryCodecDeriver = new ToonBinaryCodecDeriver(
fieldNameMapper, caseNameMapper, discriminatorKind, arrayFormat,
delimiter, rejectExtraFields, enumValuesAsStrings, transientNone,
requireOptionFields, transientEmptyCollection, requireCollectionFields,
transientDefaultValue, requireDefaultValueFields, enableKeyFolding
)
// Deriver implementation
override def derivePrimitive[F[_, _], A](
primitiveType: PrimitiveType[A],
typeName: TypeName[A],
binding: Binding[BindingType.Primitive, A],
doc: Doc,
modifiers: Seq[Modifier.Reflect]
): Lazy[ToonBinaryCodec[A]] = Lazy {
// Implementation: return appropriate codec for primitive type
???
}
override def deriveRecord[F[_, _], A](
fields: IndexedSeq[Term[F, A, ?]],
typeName: TypeName[A],
binding: Binding[BindingType.Record, A],
doc: Doc,
modifiers: Seq[Modifier.Reflect]
)(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[ToonBinaryCodec[A]] = Lazy {
// Implementation: derive codec for case class / record
???
}
override def deriveVariant[F[_, _], A](
cases: IndexedSeq[Term[F, A, ?]],
typeName: TypeName[A],
binding: Binding[BindingType.Variant, A],
doc: Doc,
modifiers: Seq[Modifier.Reflect]
)(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[ToonBinaryCodec[A]] = Lazy {
// Implementation: derive codec for sealed trait / enum
// Handle discriminatorKind, enumValuesAsStrings, caseNameMapper
???
}
override def deriveSequence[F[_, _], C[_], A](
element: Reflect[F, A],
typeName: TypeName[C[A]],
binding: Binding[BindingType.Seq[C], C[A]],
doc: Doc,
modifiers: Seq[Modifier.Reflect]
)(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[ToonBinaryCodec[C[A]]] = Lazy {
// Implementation: derive codec for sequences
// Key TOON logic: select array format based on arrayFormat setting
// and element uniformity analysis
???
}
override def deriveMap[F[_, _], M[_, _], K, V](
key: Reflect[F, K],
value: Reflect[F, V],
typeName: TypeName[M[K, V]],
binding: Binding[BindingType.Map[M], M[K, V]],
doc: Doc,
modifiers: Seq[Modifier.Reflect]
)(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[ToonBinaryCodec[M[K, V]]] = Lazy {
// Implementation: derive codec for maps
???
}
override def deriveDynamic[F[_, _]](
binding: Binding[BindingType.Dynamic, DynamicValue],
doc: Doc,
modifiers: Seq[Modifier.Reflect]
)(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[ToonBinaryCodec[DynamicValue]] = Lazy {
// Implementation: derive codec for dynamic values
???
}
override def deriveWrapper[F[_, _], A, B](
wrapped: Reflect[F, B],
typeName: TypeName[A],
wrapperPrimitiveType: Option[PrimitiveType[A]],
binding: Binding[BindingType.Wrapper[A, B], A],
doc: Doc,
modifiers: Seq[Modifier.Reflect]
)(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[ToonBinaryCodec[A]] = Lazy {
// Implementation: derive codec for wrapper types (newtypes)
???
}
}3.5 ToonBinaryCodec
package zio.blocks.schema.toon
import zio.blocks.schema.SchemaError
import zio.blocks.schema.codec.BinaryCodec
import java.nio.ByteBuffer
/**
* Abstract codec for TOON encoding/decoding.
*
* @param valueType Optimization hint for primitive types
*/
abstract class ToonBinaryCodec[A](val valueType: Int = ToonBinaryCodec.objectType)
extends BinaryCodec[A] {
/**
* Decode a value from a TOON reader.
*
* @param in The TOON reader providing input
* @param default Default value for initialization
* @return The decoded value
*/
def decodeValue(in: ToonReader, default: A): A
/**
* Encode a value to a TOON writer.
*
* @param x The value to encode
* @param out The TOON writer for output
*/
def encodeValue(x: A, out: ToonWriter): Unit
/**
* Decode a value used as a map key.
*/
def decodeKey(in: ToonReader): A =
in.decodeError("decoding as TOON key is not supported")
/**
* Encode a value as a map key.
*/
def encodeKey(x: A, out: ToonWriter): Unit =
out.encodeError("encoding as TOON key is not supported")
/**
* The null/default value for this type.
*/
def nullValue: A = null.asInstanceOf[A]
// Public API
override def decode(input: ByteBuffer): Either[SchemaError, A] =
decode(input, ToonReaderConfig)
override def encode(value: A, output: ByteBuffer): Unit =
encode(value, output, ToonWriterConfig)
def decode(input: ByteBuffer, config: ToonReaderConfig): Either[SchemaError, A]
def encode(value: A, output: ByteBuffer, config: ToonWriterConfig): Unit
// Convenience methods for byte arrays and strings
def decodeFromString(input: String): Either[SchemaError, A]
def encodeToString(value: A): String
}
object ToonBinaryCodec {
val objectType = 0
val intType = 1
val longType = 2
val floatType = 3
val doubleType = 4
val booleanType = 5
val byteType = 6
val charType = 7
val shortType = 8
val unitType = 9
// Predefined primitive codecs
val unitCodec: ToonBinaryCodec[Unit] = ???
val booleanCodec: ToonBinaryCodec[Boolean] = ???
val byteCodec: ToonBinaryCodec[Byte] = ???
val shortCodec: ToonBinaryCodec[Short] = ???
val intCodec: ToonBinaryCodec[Int] = ???
val longCodec: ToonBinaryCodec[Long] = ???
val floatCodec: ToonBinaryCodec[Float] = ???
val doubleCodec: ToonBinaryCodec[Double] = ???
val charCodec: ToonBinaryCodec[Char] = ???
val stringCodec: ToonBinaryCodec[String] = ???
val bigIntCodec: ToonBinaryCodec[BigInt] = ???
val bigDecimalCodec: ToonBinaryCodec[BigDecimal] = ???
// ... java.time codecs, UUID, Currency, etc.
}3.6 Configuration Classes
package zio.blocks.schema.toon
/**
* Configuration for ToonReader.
*
* @param preferredBufSize Preferred byte buffer size
* @param preferredCharBufSize Preferred char buffer size
* @param maxBufSize Maximum byte buffer size
* @param maxCharBufSize Maximum char buffer size
* @param checkForEndOfInput Verify no trailing content after parsing
* @param strictArrayLength Validate array length markers match actual count
*/
class ToonReaderConfig private (
val preferredBufSize: Int,
val preferredCharBufSize: Int,
val maxBufSize: Int,
val maxCharBufSize: Int,
val checkForEndOfInput: Boolean,
val strictArrayLength: Boolean
) extends Serializable {
def withStrictArrayLength(strict: Boolean): ToonReaderConfig =
copy(strictArrayLength = strict)
// ... other builder methods
}
object ToonReaderConfig extends ToonReaderConfig(
preferredBufSize = 32768,
preferredCharBufSize = 4096,
maxBufSize = 33554432,
maxCharBufSize = 4194304,
checkForEndOfInput = true,
strictArrayLength = true
)
/**
* Configuration for ToonWriter.
*
* @param indentSize Spaces per indentation level (default: 2)
* @param preferredBufSize Preferred output buffer size
* @param lineEnding Line ending style (LF recommended per spec)
*/
class ToonWriterConfig private (
val indentSize: Int,
val preferredBufSize: Int,
val lineEnding: String
) extends Serializable {
def withIndentSize(size: Int): ToonWriterConfig =
copy(indentSize = size)
// ... other builder methods
}
object ToonWriterConfig extends ToonWriterConfig(
indentSize = 2,
preferredBufSize = 32768,
lineEnding = "\n"
)Part 4: Encoding Rules and Algorithms
4.1 Array Format Selection Algorithm
When ArrayFormat.Auto is configured, the encoder must analyze array contents:
def selectArrayFormat[A](elements: Iterable[A], elementCodec: ToonBinaryCodec[A]): ArrayFormat = {
if (elements.isEmpty) {
ArrayFormat.Inline // Empty arrays: items[0]:
} else if (isPrimitiveCodec(elementCodec)) {
ArrayFormat.Inline // Primitive arrays: items[3]: a,b,c
} else if (isUniformObjectArray(elements)) {
ArrayFormat.Tabular // Uniform objects: items[N]{fields}: rows...
} else {
ArrayFormat.List // Everything else: - item format
}
}
def isUniformObjectArray[A](elements: Iterable[A]): Boolean = {
// Check that:
// 1. All elements are objects (case classes)
// 2. All have identical field names in same order
// 3. All field values are primitives (not nested objects/arrays)
???
}4.2 String Encoding Rules
def requiresQuoting(s: String, delimiter: Char): Boolean = {
s.isEmpty ||
s.charAt(0).isWhitespace ||
s.charAt(s.length - 1).isWhitespace ||
s.indexOf(delimiter) >= 0 ||
s.indexOf(':') >= 0 ||
s.indexOf('{') >= 0 ||
s.indexOf('}') >= 0 ||
s.indexOf('[') >= 0 ||
s.indexOf(']') >= 0 ||
containsControlCharacters(s)
}
def encodeString(s: String, delimiter: Char, out: ToonWriter): Unit = {
if (requiresQuoting(s, delimiter)) {
out.writeQuotedString(s) // Escape \, ", \n, \r, \t
} else {
out.writeRawString(s)
}
}4.3 Number Encoding Rules
def encodeNumber(n: BigDecimal, out: ToonWriter): Unit = {
if (n.isNaN || n.isInfinity) {
out.writeNull()
} else if (n == BigDecimal(0) && n.signum < 0) {
out.writeRaw("0") // Normalize -0 to 0
} else {
// Convert to non-exponential decimal form
out.writeRaw(n.bigDecimal.toPlainString)
}
}4.4 ADT Encoding with Discriminators
DiscriminatorKind.Key (default):
Cat:
name: Whiskers
lives: 9
DiscriminatorKind.Field("type"):
type: Cat
name: Whiskers
lives: 9
DiscriminatorKind.None:
name: Whiskers
lives: 9
(Decoder tries each case sequentially)
4.5 Tabular Array Encoding
For uniform object arrays:
def encodeTabularArray[A](
fieldName: String,
elements: IndexedSeq[A],
fieldNames: IndexedSeq[String],
fieldCodecs: IndexedSeq[ToonBinaryCodec[?]],
out: ToonWriter
): Unit = {
// Header: fieldName[count]{field1,field2,...}:
out.writeRaw(fieldName)
out.writeRaw("[")
out.writeRaw(elements.length.toString)
out.writeRaw("]{")
out.writeRaw(fieldNames.mkString(","))
out.writeRaw("}:")
out.newLine()
// Rows: value1,value2,...
elements.foreach { element =>
out.writeIndent()
fieldCodecs.zipWithIndex.foreach { case (codec, idx) =>
if (idx > 0) out.writeRaw(",")
codec.encodeValue(getField(element, idx), out)
}
out.newLine()
}
}Part 5: Acceptance Criteria
5.1 Functional Requirements
Primitive Types
- All primitive types encode/decode correctly: Unit, Boolean, Byte, Short, Int, Long, Float, Double, Char, String, BigInt, BigDecimal
- All java.time types: Instant, LocalDate, LocalTime, LocalDateTime, OffsetDateTime, ZonedDateTime, Duration, Period, Year, YearMonth, MonthDay, Month, DayOfWeek, ZoneId, ZoneOffset
- UUID and Currency types
- Numbers use decimal form (no scientific notation)
- NaN and Infinity encode as
null - -0 normalizes to 0
Strings
- Unquoted strings work for simple values
- Quoted strings handle delimiters, colons, whitespace, control characters
- Only valid escape sequences:
\\,\",\n,\r,\t - UTF-8 encoding with LF line endings
Arrays
- ArrayFormat.Auto selects optimal format
- Tabular format for uniform object arrays
- Inline format for primitive arrays
- List format for heterogeneous data
- Array length markers
[N]are accurate - Empty arrays encode correctly:
items[0]: - Custom delimiter support (comma, tab, pipe)
Objects/Records
- Indentation-based nesting works correctly
- Field name transformation via NameMapper
- Transient field handling (None, empty collections, defaults)
- Required field validation
- Extra field rejection (configurable)
- Modifier.rename and Modifier.alias support
ADTs/Variants
- DiscriminatorKind.Key (wrapper object) works
- DiscriminatorKind.Field embeds discriminator
- DiscriminatorKind.None tries cases sequentially
- Case name transformation via NameMapper
- enumValuesAsStrings for case object enums
- Nested ADTs work correctly
- Modifier.rename and Modifier.alias on cases
Maps
- String-keyed maps encode as objects
- Non-string-keyed maps use array of pairs or error
Wrappers/Newtypes
- Wrapper types encode as their underlying type
- Validation on decode (partial wrappers)
DynamicValue
- Full DynamicValue support for schema-less data
5.2 Non-Functional Requirements
Performance
- Zero-allocation encoding for primitives (use value types)
- Streaming encode/decode (no full materialization)
- Buffer reuse via thread-local pools
- Comparable performance to JSON codec
Compatibility
- Cross-platform: JVM, Scala.js, Scala Native
- Scala 2.13 and Scala 3 support
- No runtime reflection
Specification Compliance
- UTF-8 output with LF line endings
- Consistent indentation (configurable, default 2 spaces)
- No trailing whitespace
- No trailing newline
- Accurate array length markers
- Preserve object key order
5.3 Test Coverage
Unit Tests
- All primitive codecs round-trip correctly
- All array formats encode/decode correctly
- All discriminator kinds work
- All NameMapper variants work
- Error messages include path information
- Edge cases: empty strings, empty arrays, empty objects, deeply nested structures
Property-Based Tests
- Arbitrary case classes round-trip
- Arbitrary sealed traits round-trip
- JSON↔TOON conversion is lossless
Integration Tests
- Large documents (>1MB)
- Deeply nested structures (>100 levels)
- Wide objects (>100 fields)
- Unicode content
5.4 Documentation
- Scaladoc on all public APIs
- Usage examples in tests
- README with quick start guide
- Configuration reference
Part 6: Reference Implementation Notes
6.1 Existing TOON Libraries
toon4s (github.com/vim89/toon4s) provides a Scala TOON implementation with:
- Sealed ADT for TOON values:
ToonValue = TNull | TBool | TNumber | TString | TArray | TObj - JSON↔TOON bidirectional conversion
- Does NOT provide automatic derivation for case classes
TypeScript SDK (github.com/toon-format/toon) is the reference implementation with:
- Complete parser and serializer
- Schema-aware encoding
- Comprehensive test suite
6.2 JSON Codec Reference
The JsonBinaryCodecDeriver in zio-blocks serves as the primary reference for implementation patterns:
- Thread-local caching for recursive types
- Field info classes for optimized encoding
- String map for O(1) field lookup during decoding
- Specialized codecs for primitive arrays
6.3 Test Data
The TOON specification repository includes a test suite at github.com/toon-format/spec/tree/main/tests with:
- Valid TOON documents
- Invalid TOON documents with expected errors
- JSON↔TOON conversion pairs
Appendix A: Example Encodings
Simple Record
case class Person(name: String, age: Int)
val person = Person("Alice", 30)TOON:
name: Alice
age: 30
Nested Record
case class Address(street: String, city: String)
case class Person(name: String, address: Address)
val person = Person("Alice", Address("123 Main", "Springfield"))TOON:
name: Alice
address:
street: 123 Main
city: Springfield
Uniform Array (Tabular)
case class User(id: Int, name: String)
val users = List(User(1, "Alice"), User(2, "Bob"))TOON:
[2]{id,name}:
1,Alice
2,Bob
Sealed Trait (Key Discriminator)
sealed trait Pet
case class Cat(name: String, lives: Int) extends Pet
case class Dog(name: String, breed: String) extends Pet
val pet: Pet = Cat("Whiskers", 9)TOON:
Cat:
name: Whiskers
lives: 9
Sealed Trait (Field Discriminator)
// With: .withDiscriminatorKind(DiscriminatorKind.Field("type"))TOON:
type: Cat
name: Whiskers
lives: 9
Case Object Enum
sealed trait Color
case object Red extends Color
case object Green extends Color
case object Blue extends Color
val color: Color = GreenTOON (enumValuesAsStrings = true, default):
Green
TOON (enumValuesAsStrings = false):
Green:
Option Types
case class Config(name: String, timeout: Option[Int])
val config = Config("app", Some(30))TOON (transientNone = true, default):
name: app
timeout: 30
TOON (None value, transientNone = true):
name: app
Appendix B: Error Messages
Error messages should follow the JSON codec pattern with path information:
illegal number with leading zero at: .users[2].age
missing required field "name" at: .config
illegal discriminator at: .event
expected '}' or ',' at: .response.data
unexpected field "extra" at: .request (when rejectExtraFields = true)
array length mismatch: expected 3, got 2 at: .items (when strictArrayLength = true)
Appendix C: Configuration Quick Reference
| Option | Type | Default | Description |
|---|---|---|---|
fieldNameMapper |
NameMapper |
Identity |
Field name transformation |
caseNameMapper |
NameMapper |
Identity |
Case name transformation |
discriminatorKind |
DiscriminatorKind |
Key |
ADT encoding strategy |
arrayFormat |
ArrayFormat |
Auto |
Array encoding preference |
delimiter |
Char |
, |
Array value separator |
rejectExtraFields |
Boolean |
false |
Fail on unknown fields |
enumValuesAsStrings |
Boolean |
true |
Case objects as strings |
transientNone |
Boolean |
true |
Omit None values |
requireOptionFields |
Boolean |
false |
Require Option fields |
transientEmptyCollection |
Boolean |
true |
Omit empty collections |
requireCollectionFields |
Boolean |
false |
Require collections |
transientDefaultValue |
Boolean |
true |
Omit default values |
requireDefaultValueFields |
Boolean |
false |
Require default fields |
enableKeyFolding |
Boolean |
false |
Dotted key expansion |