Skip to content

Add TOON support to ZIO Schema 2 #654

@jdegoes

Description

@jdegoes

This ticket is for adding TOON support for ZIO Schema 2, as a new Format, with associated codec, deriver, test, and documentation.

NOTE: What follows is an AI-generated description of the problem and sketch of solution--it may be useful, but it certainly contains errors, and if you don't know enough to find and fix those errors, you shouldn't attempt to complete this ticket.


TOON Format Implementation Guide for ZIO Schema 2

Executive Summary

This guide provides a complete specification for implementing TOON (Token-Oriented Object Notation) codec support in ZIO Schema 2 (zio-blocks). TOON is a compact, human-readable serialization format designed to minimize token usage when passing structured data to Large Language Models, achieving 30-60% token reduction compared to JSON while maintaining lossless bidirectional conversion.

The implementation will follow the established patterns in zio-blocks, mirroring the architecture of JsonBinaryCodecDeriver while adding TOON-specific capabilities for array format selection and indentation-based structure.


Part 1: TOON Format Specification

1.1 Overview

TOON was created by Johann Schopplich in 2025 to address the inefficiency of JSON when used in LLM prompts. The format combines YAML-style indentation with CSV-style tabular data representation. The specification is maintained at github.com/toon-format/spec, currently at version 3.0.

Design goals:

  • Minimize token count for LLM context windows
  • Maintain human readability
  • Enable lossless JSON↔TOON conversion
  • Schema-aware encoding for maximum compression

1.2 Data Types

TOON supports the complete JSON data model:

Type TOON Representation Example
String Unquoted (default) or quoted hello or "hello, world"
Number Decimal form only (no scientific notation) 42, 3.14159
Boolean Lowercase keywords true, false
Null Keyword null
Array Three formats (see §1.4) items[3]: a,b,c
Object Indentation-based nesting See §1.3
# TOON Format Implementation Guide for ZIO Schema 2

Executive Summary

This guide provides a complete specification for implementing TOON (Token-Oriented Object Notation) codec support in ZIO Schema 2 (zio-blocks). TOON is a compact, human-readable serialization format designed to minimize token usage when passing structured data to Large Language Models, achieving 30-60% token reduction compared to JSON while maintaining lossless bidirectional conversion.

The implementation will follow the established patterns in zio-blocks, mirroring the architecture of JsonBinaryCodecDeriver while adding TOON-specific capabilities for array format selection and indentation-based structure.


Part 1: TOON Format Specification

1.1 Overview

TOON was created by Johann Schopplich in 2025 to address the inefficiency of JSON when used in LLM prompts. The format combines YAML-style indentation with CSV-style tabular data representation. The specification is maintained at [github.com/toon-format/spec](https://github.com/toon-format/spec), currently at version 3.0.

Design goals:

  • Minimize token count for LLM context windows
  • Maintain human readability
  • Enable lossless JSON↔TOON conversion
  • Schema-aware encoding for maximum compression

1.2 Data Types

TOON supports the complete JSON data model:

Type TOON Representation Example
String Unquoted (default) or quoted hello or "hello, world"
Number Decimal form only (no scientific notation) 42, 3.14159
Boolean Lowercase keywords true, false
Null Keyword null
Array Three formats (see §1.4) items[3]: a,b,c
Object Indentation-based nesting See §1.3

1.3 Object Encoding

Objects use indentation (2 spaces default) with colon-separated key-value pairs:

name: Alice
age: 30
address:
  street: 123 Main St
  city: Springfield

Equivalent JSON:

{"name":"Alice","age":30,"address":{"street":"123 Main St","city":"Springfield"}}

Key rules:

  • Keys are unquoted unless they contain special characters
  • Values on the same line as keys (primitives) or indented below (nested structures)
  • Empty objects: just the key with colon and nothing following

1.4 Array Encoding Formats

TOON's primary innovation is intelligent array encoding. The format supports three array representations:

Tabular Format (Maximum Compression)

For arrays of uniform objects where all elements share identical keys with only primitive values:

users[3]{id,name,email}:
  1,Alice,[email protected]
  2,Bob,[email protected]
  3,Carol,[email protected]

Equivalent JSON:

{"users":[{"id":1,"name":"Alice","email":"[email protected]"},{"id":2,"name":"Bob","email":"[email protected]"},{"id":3,"name":"Carol","email":"[email protected]"}]}

Tabular eligibility requirements:

  1. All elements must be objects
  2. All objects must have identical keys in the same order
  3. All field values must be primitives (not nested objects or arrays)

Inline Format (Primitive Arrays)

For arrays containing only primitive values:

tags[4]: javascript,react,typescript,node
numbers[5]: 1,2,3,4,5

List Format (Heterogeneous Data)

For arrays with mixed types, nested structures, or non-uniform objects:

items[3]:
  - name: Widget
    price: 9.99
  - name: Gadget
    price: 19.99
  - simple string value

1.5 String Quoting Rules

Strings are unquoted by default. Quotes are required only when the string contains:

  • The active delimiter (comma by default)
  • A colon :
  • Leading or trailing whitespace
  • Control characters
  • The characters {, }, [, ]

Escape sequences (only these five are valid):

  • \\ → backslash
  • \" → double quote
  • \n → newline
  • \r → carriage return
  • \t → tab

1.6 Number Formatting

TOON requires decimal form without scientific notation:

Value JSON TOON
15 billion 1.5e10 15000000000
Tiny 1e-10 0.0000000001
NaN N/A null
Infinity N/A null
-0 -0 0

1.7 Key Folding (Optional)

Chains of single-key wrapper objects can be collapsed:

user.profile.settings.theme: dark

Equivalent to:

user:
  profile:
    settings:
      theme: dark

Part 2: ZIO Schema 2 Architecture

2.1 Core Abstractions

ZIO Schema 2 uses a deriver-based architecture where format codecs are derived from Schema[A] definitions. The key components are:

// The schema definition
case class Person(name: String, age: Int)
object Person {
  implicit val schema: Schema[Person] = Schema.derived
}

// Deriving a codec
val jsonCodec: JsonBinaryCodec[Person] = Schema[Person].derive(JsonFormat.deriver)

2.2 Deriver Trait

The Deriver[TC[_]] trait defines how to derive type class instances for different schema shapes:

trait Deriver[TC[_]] {
  def derivePrimitive[F[_, _], A](
    primitiveType: PrimitiveType[A],
    typeName: TypeName[A],
    binding: Binding[BindingType.Primitive, A],
    doc: Doc,
    modifiers: Seq[Modifier.Reflect]
  ): Lazy[TC[A]]

  def deriveRecord[F[_, _], A](
    fields: IndexedSeq[Term[F, A, ?]],
    typeName: TypeName[A],
    binding: Binding[BindingType.Record, A],
    doc: Doc,
    modifiers: Seq[Modifier.Reflect]
  )(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[TC[A]]

  def deriveVariant[F[_, _], A](
    cases: IndexedSeq[Term[F, A, ?]],
    typeName: TypeName[A],
    binding: Binding[BindingType.Variant, A],
    doc: Doc,
    modifiers: Seq[Modifier.Reflect]
  )(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[TC[A]]

  def deriveSequence[F[_, _], C[_], A](
    element: Reflect[F, A],
    typeName: TypeName[C[A]],
    binding: Binding[BindingType.Seq[C], C[A]],
    doc: Doc,
    modifiers: Seq[Modifier.Reflect]
  )(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[TC[C[A]]]

  def deriveMap[F[_, _], M[_, _], K, V](
    key: Reflect[F, K],
    value: Reflect[F, V],
    typeName: TypeName[M[K, V]],
    binding: Binding[BindingType.Map[M], M[K, V]],
    doc: Doc,
    modifiers: Seq[Modifier.Reflect]
  )(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[TC[M[K, V]]]

  def deriveDynamic[F[_, _]](
    binding: Binding[BindingType.Dynamic, DynamicValue],
    doc: Doc,
    modifiers: Seq[Modifier.Reflect]
  )(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[TC[DynamicValue]]

  def deriveWrapper[F[_, _], A, B](
    wrapped: Reflect[F, B],
    typeName: TypeName[A],
    wrapperPrimitiveType: Option[PrimitiveType[A]],
    binding: Binding[BindingType.Wrapper[A, B], A],
    doc: Doc,
    modifiers: Seq[Modifier.Reflect]
  )(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[TC[A]]
}

2.3 BinaryCodec Pattern

Codecs extend BinaryCodec[A] and work with streaming readers/writers:

abstract class JsonBinaryCodec[A](val valueType: Int = JsonBinaryCodec.objectType) 
    extends BinaryCodec[A] {
  
  // Core methods to implement
  def decodeValue(in: JsonReader, default: A): A
  def encodeValue(x: A, out: JsonWriter): Unit
  
  // Optional key encoding (for map keys)
  def decodeKey(in: JsonReader): A
  def encodeKey(x: A, out: JsonWriter): Unit
  
  // Null value for initialization
  def nullValue: A = null.asInstanceOf[A]
  
  // Public API
  def decode(input: ByteBuffer, config: ReaderConfig): Either[SchemaError, A]
  def encode(value: A, output: ByteBuffer, config: WriterConfig): Unit
}

2.4 Configuration Architecture

Configuration is split between two concerns:

Semantic configuration lives on the deriver class itself:

class JsonBinaryCodecDeriver(
  fieldNameMapper: NameMapper,           // Field name transformation
  caseNameMapper: NameMapper,            // Case/variant name transformation  
  discriminatorKind: DiscriminatorKind,  // ADT encoding strategy
  rejectExtraFields: Boolean,            // Fail on unknown fields
  enumValuesAsStrings: Boolean,          // Enum encoding style
  transientNone: Boolean,                // Omit None values
  requireOptionFields: Boolean,          // Require Option fields
  transientEmptyCollection: Boolean,     // Omit empty collections
  requireCollectionFields: Boolean,      // Require collection fields
  transientDefaultValue: Boolean,        // Omit default-valued fields
  requireDefaultValueFields: Boolean     // Require fields with defaults
) extends Deriver[JsonBinaryCodec]

Runtime configuration lives in separate config classes:

// ReaderConfig: buffer sizes and parsing behavior
class ReaderConfig(
  val preferredBufSize: Int,      // Default: 32768
  val preferredCharBufSize: Int,  // Default: 4096
  val maxBufSize: Int,            // Default: 33554432
  val maxCharBufSize: Int,        // Default: 4194304
  val checkForEndOfInput: Boolean // Default: true
)

// WriterConfig: output formatting
class WriterConfig(
  val indentionStep: Int,     // Default: 0 (compact)
  val preferredBufSize: Int,  // Default: 32768
  val escapeUnicode: Boolean  // Default: false
)

2.5 DiscriminatorKind for ADTs

Sum types (sealed traits) support three encoding strategies:

sealed trait DiscriminatorKind

object DiscriminatorKind {
  // Wrapper object: {"Cat": {"name": "Whiskers"}}
  case object Key extends DiscriminatorKind  // DEFAULT
  
  // Embedded field: {"type": "Cat", "name": "Whiskers"}
  case class Field(name: String) extends DiscriminatorKind
  
  // No discriminator: try each case sequentially
  case object None extends DiscriminatorKind
}

2.6 NameMapper for Field Transformation

sealed trait NameMapper extends (String => String)

object NameMapper {
  case object Identity extends NameMapper   // No transformation (default)
  case object SnakeCase extends NameMapper  // memberName → member_name
  case object CamelCase extends NameMapper  // member_name → memberName
  case object PascalCase extends NameMapper // member_name → MemberName
  case object KebabCase extends NameMapper  // memberName → member-name
  case class Custom(f: String => String) extends NameMapper
}

2.7 Modifier System

ZIO Schema 2 uses Modifier classes (not Java annotations) for customization:

// Rename a field or case
@Modifier.rename("new_name")
case class Example(field: String)

// Add decoding aliases
@Modifier.alias("old_name")
case object Blue extends Color

// Mark field as transient (excluded from serialization)
@Modifier.transient()
val internalField: Int = 0

Programmatic application:

val codec = Color.schema
  .deriving(JsonBinaryCodecDeriver)
  .modifier(Color.red, Modifier.rename("Rose"))
  .modifier(Color.red, Modifier.alias("Ruby"))
  .derive

Part 3: TOON Implementation Design

3.1 Module Structure

zio-blocks/
└── schema-toon/
    └── src/main/scala/zio/blocks/schema/toon/
        ├── ToonFormat.scala           # Format definition object
        ├── ToonBinaryCodec.scala      # Abstract codec class
        ├── ToonBinaryCodecDeriver.scala # Deriver implementation
        ├── ToonReader.scala           # Streaming parser
        ├── ToonWriter.scala           # Streaming serializer
        ├── ReaderConfig.scala         # Parser configuration
        ├── WriterConfig.scala         # Serializer configuration
        ├── ArrayFormat.scala          # TOON-specific array encoding
        └── DiscriminatorKind.scala    # Reuse or extend from JSON

3.2 ToonFormat Object

package zio.blocks.schema.toon

import zio.blocks.schema.codec.BinaryFormat

/**
 * The TOON format for ZIO Schema 2.
 * 
 * TOON (Token-Oriented Object Notation) is a compact serialization format
 * optimized for LLM token efficiency, achieving 30-60% reduction vs JSON.
 */
object ToonFormat extends BinaryFormat("application/toon", ToonBinaryCodecDeriver)

3.3 ArrayFormat Enum

package zio.blocks.schema.toon

/**
 * Specifies how arrays should be encoded in TOON format.
 */
sealed trait ArrayFormat

object ArrayFormat {
  /**
   * Automatically select the most compact format based on array contents:
   * - Tabular for uniform object arrays with primitive fields
   * - Inline for primitive arrays
   * - List for heterogeneous or nested data
   */
  case object Auto extends ArrayFormat
  
  /**
   * Force tabular format: `items[N]{field1,field2}: val1,val2`
   * Falls back to List if array is not tabular-eligible.
   */
  case object Tabular extends ArrayFormat
  
  /**
   * Force inline format: `items[N]: val1,val2,val3`
   * Only valid for primitive arrays.
   */
  case object Inline extends ArrayFormat
  
  /**
   * Force list format with `- ` markers.
   */
  case object List extends ArrayFormat
}

3.4 ToonBinaryCodecDeriver

package zio.blocks.schema.toon

import zio.blocks.schema._
import zio.blocks.schema.binding._
import zio.blocks.schema.codec.BinaryFormat
import zio.blocks.schema.derive._
import zio.blocks.schema.json.{DiscriminatorKind, NameMapper}

/**
 * Default TOON deriver with standard settings.
 */
object ToonBinaryCodecDeriver extends ToonBinaryCodecDeriver(
  fieldNameMapper = NameMapper.Identity,
  caseNameMapper = NameMapper.Identity,
  discriminatorKind = DiscriminatorKind.Key,
  arrayFormat = ArrayFormat.Auto,
  delimiter = ',',
  rejectExtraFields = false,
  enumValuesAsStrings = true,
  transientNone = true,
  requireOptionFields = false,
  transientEmptyCollection = true,
  requireCollectionFields = false,
  transientDefaultValue = true,
  requireDefaultValueFields = false,
  enableKeyFolding = false
)

/**
 * Deriver for TOON binary codecs with configurable behavior.
 *
 * @param fieldNameMapper       Transform strategy for field names
 * @param caseNameMapper        Transform strategy for variant case names  
 * @param discriminatorKind     ADT encoding strategy (Key, Field, None)
 * @param arrayFormat           Array encoding preference (Auto, Tabular, Inline, List)
 * @param delimiter             Value separator in tabular/inline arrays (comma default)
 * @param rejectExtraFields     Fail decoding on unrecognized fields
 * @param enumValuesAsStrings   Encode case object enums as strings
 * @param transientNone         Omit None-valued Option fields
 * @param requireOptionFields   Require Option fields to be present
 * @param transientEmptyCollection  Omit empty collection fields
 * @param requireCollectionFields   Require collection fields to be present
 * @param transientDefaultValue     Omit fields matching their default value
 * @param requireDefaultValueFields Require fields with defaults to be present
 * @param enableKeyFolding      Enable dotted key path expansion
 */
class ToonBinaryCodecDeriver private[toon] (
  fieldNameMapper: NameMapper,
  caseNameMapper: NameMapper,
  discriminatorKind: DiscriminatorKind,
  arrayFormat: ArrayFormat,
  delimiter: Char,
  rejectExtraFields: Boolean,
  enumValuesAsStrings: Boolean,
  transientNone: Boolean,
  requireOptionFields: Boolean,
  transientEmptyCollection: Boolean,
  requireCollectionFields: Boolean,
  transientDefaultValue: Boolean,
  requireDefaultValueFields: Boolean,
  enableKeyFolding: Boolean
) extends Deriver[ToonBinaryCodec] {

  // Builder methods
  def withFieldNameMapper(mapper: NameMapper): ToonBinaryCodecDeriver =
    copy(fieldNameMapper = mapper)
    
  def withCaseNameMapper(mapper: NameMapper): ToonBinaryCodecDeriver =
    copy(caseNameMapper = mapper)
    
  def withDiscriminatorKind(kind: DiscriminatorKind): ToonBinaryCodecDeriver =
    copy(discriminatorKind = kind)
    
  def withArrayFormat(format: ArrayFormat): ToonBinaryCodecDeriver =
    copy(arrayFormat = format)
    
  def withDelimiter(delim: Char): ToonBinaryCodecDeriver =
    copy(delimiter = delim)
    
  def withRejectExtraFields(reject: Boolean): ToonBinaryCodecDeriver =
    copy(rejectExtraFields = reject)
    
  def withEnumValuesAsStrings(asStrings: Boolean): ToonBinaryCodecDeriver =
    copy(enumValuesAsStrings = asStrings)
    
  def withTransientNone(transient: Boolean): ToonBinaryCodecDeriver =
    copy(transientNone = transient)
    
  def withKeyFolding(enabled: Boolean): ToonBinaryCodecDeriver =
    copy(enableKeyFolding = enabled)

  // ... additional builder methods ...

  private def copy(
    fieldNameMapper: NameMapper = fieldNameMapper,
    caseNameMapper: NameMapper = caseNameMapper,
    discriminatorKind: DiscriminatorKind = discriminatorKind,
    arrayFormat: ArrayFormat = arrayFormat,
    delimiter: Char = delimiter,
    rejectExtraFields: Boolean = rejectExtraFields,
    enumValuesAsStrings: Boolean = enumValuesAsStrings,
    transientNone: Boolean = transientNone,
    requireOptionFields: Boolean = requireOptionFields,
    transientEmptyCollection: Boolean = transientEmptyCollection,
    requireCollectionFields: Boolean = requireCollectionFields,
    transientDefaultValue: Boolean = transientDefaultValue,
    requireDefaultValueFields: Boolean = requireDefaultValueFields,
    enableKeyFolding: Boolean = enableKeyFolding
  ): ToonBinaryCodecDeriver = new ToonBinaryCodecDeriver(
    fieldNameMapper, caseNameMapper, discriminatorKind, arrayFormat,
    delimiter, rejectExtraFields, enumValuesAsStrings, transientNone,
    requireOptionFields, transientEmptyCollection, requireCollectionFields,
    transientDefaultValue, requireDefaultValueFields, enableKeyFolding
  )

  // Deriver implementation
  override def derivePrimitive[F[_, _], A](
    primitiveType: PrimitiveType[A],
    typeName: TypeName[A],
    binding: Binding[BindingType.Primitive, A],
    doc: Doc,
    modifiers: Seq[Modifier.Reflect]
  ): Lazy[ToonBinaryCodec[A]] = Lazy {
    // Implementation: return appropriate codec for primitive type
    ???
  }

  override def deriveRecord[F[_, _], A](
    fields: IndexedSeq[Term[F, A, ?]],
    typeName: TypeName[A],
    binding: Binding[BindingType.Record, A],
    doc: Doc,
    modifiers: Seq[Modifier.Reflect]
  )(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[ToonBinaryCodec[A]] = Lazy {
    // Implementation: derive codec for case class / record
    ???
  }

  override def deriveVariant[F[_, _], A](
    cases: IndexedSeq[Term[F, A, ?]],
    typeName: TypeName[A],
    binding: Binding[BindingType.Variant, A],
    doc: Doc,
    modifiers: Seq[Modifier.Reflect]
  )(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[ToonBinaryCodec[A]] = Lazy {
    // Implementation: derive codec for sealed trait / enum
    // Handle discriminatorKind, enumValuesAsStrings, caseNameMapper
    ???
  }

  override def deriveSequence[F[_, _], C[_], A](
    element: Reflect[F, A],
    typeName: TypeName[C[A]],
    binding: Binding[BindingType.Seq[C], C[A]],
    doc: Doc,
    modifiers: Seq[Modifier.Reflect]
  )(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[ToonBinaryCodec[C[A]]] = Lazy {
    // Implementation: derive codec for sequences
    // Key TOON logic: select array format based on arrayFormat setting
    // and element uniformity analysis
    ???
  }

  override def deriveMap[F[_, _], M[_, _], K, V](
    key: Reflect[F, K],
    value: Reflect[F, V],
    typeName: TypeName[M[K, V]],
    binding: Binding[BindingType.Map[M], M[K, V]],
    doc: Doc,
    modifiers: Seq[Modifier.Reflect]
  )(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[ToonBinaryCodec[M[K, V]]] = Lazy {
    // Implementation: derive codec for maps
    ???
  }

  override def deriveDynamic[F[_, _]](
    binding: Binding[BindingType.Dynamic, DynamicValue],
    doc: Doc,
    modifiers: Seq[Modifier.Reflect]
  )(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[ToonBinaryCodec[DynamicValue]] = Lazy {
    // Implementation: derive codec for dynamic values
    ???
  }

  override def deriveWrapper[F[_, _], A, B](
    wrapped: Reflect[F, B],
    typeName: TypeName[A],
    wrapperPrimitiveType: Option[PrimitiveType[A]],
    binding: Binding[BindingType.Wrapper[A, B], A],
    doc: Doc,
    modifiers: Seq[Modifier.Reflect]
  )(implicit F: HasBinding[F], D: HasInstance[F]): Lazy[ToonBinaryCodec[A]] = Lazy {
    // Implementation: derive codec for wrapper types (newtypes)
    ???
  }
}

3.5 ToonBinaryCodec

package zio.blocks.schema.toon

import zio.blocks.schema.SchemaError
import zio.blocks.schema.codec.BinaryCodec
import java.nio.ByteBuffer

/**
 * Abstract codec for TOON encoding/decoding.
 *
 * @param valueType Optimization hint for primitive types
 */
abstract class ToonBinaryCodec[A](val valueType: Int = ToonBinaryCodec.objectType) 
    extends BinaryCodec[A] {

  /**
   * Decode a value from a TOON reader.
   *
   * @param in      The TOON reader providing input
   * @param default Default value for initialization
   * @return The decoded value
   */
  def decodeValue(in: ToonReader, default: A): A

  /**
   * Encode a value to a TOON writer.
   *
   * @param x   The value to encode
   * @param out The TOON writer for output
   */
  def encodeValue(x: A, out: ToonWriter): Unit

  /**
   * Decode a value used as a map key.
   */
  def decodeKey(in: ToonReader): A = 
    in.decodeError("decoding as TOON key is not supported")

  /**
   * Encode a value as a map key.
   */
  def encodeKey(x: A, out: ToonWriter): Unit = 
    out.encodeError("encoding as TOON key is not supported")

  /**
   * The null/default value for this type.
   */
  def nullValue: A = null.asInstanceOf[A]

  // Public API
  override def decode(input: ByteBuffer): Either[SchemaError, A] = 
    decode(input, ToonReaderConfig)

  override def encode(value: A, output: ByteBuffer): Unit = 
    encode(value, output, ToonWriterConfig)

  def decode(input: ByteBuffer, config: ToonReaderConfig): Either[SchemaError, A]
  
  def encode(value: A, output: ByteBuffer, config: ToonWriterConfig): Unit

  // Convenience methods for byte arrays and strings
  def decodeFromString(input: String): Either[SchemaError, A]
  def encodeToString(value: A): String
}

object ToonBinaryCodec {
  val objectType  = 0
  val intType     = 1
  val longType    = 2
  val floatType   = 3
  val doubleType  = 4
  val booleanType = 5
  val byteType    = 6
  val charType    = 7
  val shortType   = 8
  val unitType    = 9
  
  // Predefined primitive codecs
  val unitCodec: ToonBinaryCodec[Unit] = ???
  val booleanCodec: ToonBinaryCodec[Boolean] = ???
  val byteCodec: ToonBinaryCodec[Byte] = ???
  val shortCodec: ToonBinaryCodec[Short] = ???
  val intCodec: ToonBinaryCodec[Int] = ???
  val longCodec: ToonBinaryCodec[Long] = ???
  val floatCodec: ToonBinaryCodec[Float] = ???
  val doubleCodec: ToonBinaryCodec[Double] = ???
  val charCodec: ToonBinaryCodec[Char] = ???
  val stringCodec: ToonBinaryCodec[String] = ???
  val bigIntCodec: ToonBinaryCodec[BigInt] = ???
  val bigDecimalCodec: ToonBinaryCodec[BigDecimal] = ???
  // ... java.time codecs, UUID, Currency, etc.
}

3.6 Configuration Classes

package zio.blocks.schema.toon

/**
 * Configuration for ToonReader.
 *
 * @param preferredBufSize     Preferred byte buffer size
 * @param preferredCharBufSize Preferred char buffer size  
 * @param maxBufSize           Maximum byte buffer size
 * @param maxCharBufSize       Maximum char buffer size
 * @param checkForEndOfInput   Verify no trailing content after parsing
 * @param strictArrayLength    Validate array length markers match actual count
 */
class ToonReaderConfig private (
  val preferredBufSize: Int,
  val preferredCharBufSize: Int,
  val maxBufSize: Int,
  val maxCharBufSize: Int,
  val checkForEndOfInput: Boolean,
  val strictArrayLength: Boolean
) extends Serializable {
  def withStrictArrayLength(strict: Boolean): ToonReaderConfig =
    copy(strictArrayLength = strict)
  // ... other builder methods
}

object ToonReaderConfig extends ToonReaderConfig(
  preferredBufSize = 32768,
  preferredCharBufSize = 4096,
  maxBufSize = 33554432,
  maxCharBufSize = 4194304,
  checkForEndOfInput = true,
  strictArrayLength = true
)

/**
 * Configuration for ToonWriter.
 *
 * @param indentSize       Spaces per indentation level (default: 2)
 * @param preferredBufSize Preferred output buffer size
 * @param lineEnding       Line ending style (LF recommended per spec)
 */
class ToonWriterConfig private (
  val indentSize: Int,
  val preferredBufSize: Int,
  val lineEnding: String
) extends Serializable {
  def withIndentSize(size: Int): ToonWriterConfig =
    copy(indentSize = size)
  // ... other builder methods
}

object ToonWriterConfig extends ToonWriterConfig(
  indentSize = 2,
  preferredBufSize = 32768,
  lineEnding = "\n"
)

Part 4: Encoding Rules and Algorithms

4.1 Array Format Selection Algorithm

When ArrayFormat.Auto is configured, the encoder must analyze array contents:

def selectArrayFormat[A](elements: Iterable[A], elementCodec: ToonBinaryCodec[A]): ArrayFormat = {
  if (elements.isEmpty) {
    ArrayFormat.Inline  // Empty arrays: items[0]:
  } else if (isPrimitiveCodec(elementCodec)) {
    ArrayFormat.Inline  // Primitive arrays: items[3]: a,b,c
  } else if (isUniformObjectArray(elements)) {
    ArrayFormat.Tabular // Uniform objects: items[N]{fields}: rows...
  } else {
    ArrayFormat.List    // Everything else: - item format
  }
}

def isUniformObjectArray[A](elements: Iterable[A]): Boolean = {
  // Check that:
  // 1. All elements are objects (case classes)
  // 2. All have identical field names in same order
  // 3. All field values are primitives (not nested objects/arrays)
  ???
}

4.2 String Encoding Rules

def requiresQuoting(s: String, delimiter: Char): Boolean = {
  s.isEmpty ||
  s.charAt(0).isWhitespace ||
  s.charAt(s.length - 1).isWhitespace ||
  s.indexOf(delimiter) >= 0 ||
  s.indexOf(':') >= 0 ||
  s.indexOf('{') >= 0 ||
  s.indexOf('}') >= 0 ||
  s.indexOf('[') >= 0 ||
  s.indexOf(']') >= 0 ||
  containsControlCharacters(s)
}

def encodeString(s: String, delimiter: Char, out: ToonWriter): Unit = {
  if (requiresQuoting(s, delimiter)) {
    out.writeQuotedString(s)  // Escape \, ", \n, \r, \t
  } else {
    out.writeRawString(s)
  }
}

4.3 Number Encoding Rules

def encodeNumber(n: BigDecimal, out: ToonWriter): Unit = {
  if (n.isNaN || n.isInfinity) {
    out.writeNull()
  } else if (n == BigDecimal(0) && n.signum < 0) {
    out.writeRaw("0")  // Normalize -0 to 0
  } else {
    // Convert to non-exponential decimal form
    out.writeRaw(n.bigDecimal.toPlainString)
  }
}

4.4 ADT Encoding with Discriminators

DiscriminatorKind.Key (default):

Cat:
  name: Whiskers
  lives: 9

DiscriminatorKind.Field("type"):

type: Cat
name: Whiskers
lives: 9

DiscriminatorKind.None:

name: Whiskers
lives: 9

(Decoder tries each case sequentially)

4.5 Tabular Array Encoding

For uniform object arrays:

def encodeTabularArray[A](
  fieldName: String,
  elements: IndexedSeq[A],
  fieldNames: IndexedSeq[String],
  fieldCodecs: IndexedSeq[ToonBinaryCodec[?]],
  out: ToonWriter
): Unit = {
  // Header: fieldName[count]{field1,field2,...}:
  out.writeRaw(fieldName)
  out.writeRaw("[")
  out.writeRaw(elements.length.toString)
  out.writeRaw("]{")
  out.writeRaw(fieldNames.mkString(","))
  out.writeRaw("}:")
  out.newLine()
  
  // Rows: value1,value2,...
  elements.foreach { element =>
    out.writeIndent()
    fieldCodecs.zipWithIndex.foreach { case (codec, idx) =>
      if (idx > 0) out.writeRaw(",")
      codec.encodeValue(getField(element, idx), out)
    }
    out.newLine()
  }
}

Part 5: Acceptance Criteria

5.1 Functional Requirements

Primitive Types

  • All primitive types encode/decode correctly: Unit, Boolean, Byte, Short, Int, Long, Float, Double, Char, String, BigInt, BigDecimal
  • All java.time types: Instant, LocalDate, LocalTime, LocalDateTime, OffsetDateTime, ZonedDateTime, Duration, Period, Year, YearMonth, MonthDay, Month, DayOfWeek, ZoneId, ZoneOffset
  • UUID and Currency types
  • Numbers use decimal form (no scientific notation)
  • NaN and Infinity encode as null
  • -0 normalizes to 0

Strings

  • Unquoted strings work for simple values
  • Quoted strings handle delimiters, colons, whitespace, control characters
  • Only valid escape sequences: \\, \", \n, \r, \t
  • UTF-8 encoding with LF line endings

Arrays

  • ArrayFormat.Auto selects optimal format
  • Tabular format for uniform object arrays
  • Inline format for primitive arrays
  • List format for heterogeneous data
  • Array length markers [N] are accurate
  • Empty arrays encode correctly: items[0]:
  • Custom delimiter support (comma, tab, pipe)

Objects/Records

  • Indentation-based nesting works correctly
  • Field name transformation via NameMapper
  • Transient field handling (None, empty collections, defaults)
  • Required field validation
  • Extra field rejection (configurable)
  • Modifier.rename and Modifier.alias support

ADTs/Variants

  • DiscriminatorKind.Key (wrapper object) works
  • DiscriminatorKind.Field embeds discriminator
  • DiscriminatorKind.None tries cases sequentially
  • Case name transformation via NameMapper
  • enumValuesAsStrings for case object enums
  • Nested ADTs work correctly
  • Modifier.rename and Modifier.alias on cases

Maps

  • String-keyed maps encode as objects
  • Non-string-keyed maps use array of pairs or error

Wrappers/Newtypes

  • Wrapper types encode as their underlying type
  • Validation on decode (partial wrappers)

DynamicValue

  • Full DynamicValue support for schema-less data

5.2 Non-Functional Requirements

Performance

  • Zero-allocation encoding for primitives (use value types)
  • Streaming encode/decode (no full materialization)
  • Buffer reuse via thread-local pools
  • Comparable performance to JSON codec

Compatibility

  • Cross-platform: JVM, Scala.js, Scala Native
  • Scala 2.13 and Scala 3 support
  • No runtime reflection

Specification Compliance

  • UTF-8 output with LF line endings
  • Consistent indentation (configurable, default 2 spaces)
  • No trailing whitespace
  • No trailing newline
  • Accurate array length markers
  • Preserve object key order

5.3 Test Coverage

Unit Tests

  • All primitive codecs round-trip correctly
  • All array formats encode/decode correctly
  • All discriminator kinds work
  • All NameMapper variants work
  • Error messages include path information
  • Edge cases: empty strings, empty arrays, empty objects, deeply nested structures

Property-Based Tests

  • Arbitrary case classes round-trip
  • Arbitrary sealed traits round-trip
  • JSON↔TOON conversion is lossless

Integration Tests

  • Large documents (>1MB)
  • Deeply nested structures (>100 levels)
  • Wide objects (>100 fields)
  • Unicode content

5.4 Documentation

  • Scaladoc on all public APIs
  • Usage examples in tests
  • README with quick start guide
  • Configuration reference

Part 6: Reference Implementation Notes

6.1 Existing TOON Libraries

toon4s (github.com/vim89/toon4s) provides a Scala TOON implementation with:

  • Sealed ADT for TOON values: ToonValue = TNull | TBool | TNumber | TString | TArray | TObj
  • JSON↔TOON bidirectional conversion
  • Does NOT provide automatic derivation for case classes

TypeScript SDK (github.com/toon-format/toon) is the reference implementation with:

  • Complete parser and serializer
  • Schema-aware encoding
  • Comprehensive test suite

6.2 JSON Codec Reference

The JsonBinaryCodecDeriver in zio-blocks serves as the primary reference for implementation patterns:

  • Thread-local caching for recursive types
  • Field info classes for optimized encoding
  • String map for O(1) field lookup during decoding
  • Specialized codecs for primitive arrays

6.3 Test Data

The TOON specification repository includes a test suite at github.com/toon-format/spec/tree/main/tests with:

  • Valid TOON documents
  • Invalid TOON documents with expected errors
  • JSON↔TOON conversion pairs

Appendix A: Example Encodings

Simple Record

case class Person(name: String, age: Int)
val person = Person("Alice", 30)

TOON:

name: Alice
age: 30

Nested Record

case class Address(street: String, city: String)
case class Person(name: String, address: Address)
val person = Person("Alice", Address("123 Main", "Springfield"))

TOON:

name: Alice
address:
  street: 123 Main
  city: Springfield

Uniform Array (Tabular)

case class User(id: Int, name: String)
val users = List(User(1, "Alice"), User(2, "Bob"))

TOON:

[2]{id,name}:
  1,Alice
  2,Bob

Sealed Trait (Key Discriminator)

sealed trait Pet
case class Cat(name: String, lives: Int) extends Pet
case class Dog(name: String, breed: String) extends Pet

val pet: Pet = Cat("Whiskers", 9)

TOON:

Cat:
  name: Whiskers
  lives: 9

Sealed Trait (Field Discriminator)

// With: .withDiscriminatorKind(DiscriminatorKind.Field("type"))

TOON:

type: Cat
name: Whiskers
lives: 9

Case Object Enum

sealed trait Color
case object Red extends Color
case object Green extends Color
case object Blue extends Color

val color: Color = Green

TOON (enumValuesAsStrings = true, default):

Green

TOON (enumValuesAsStrings = false):

Green:

Option Types

case class Config(name: String, timeout: Option[Int])
val config = Config("app", Some(30))

TOON (transientNone = true, default):

name: app
timeout: 30

TOON (None value, transientNone = true):

name: app

Appendix B: Error Messages

Error messages should follow the JSON codec pattern with path information:

illegal number with leading zero at: .users[2].age
missing required field "name" at: .config
illegal discriminator at: .event
expected '}' or ',' at: .response.data
unexpected field "extra" at: .request  (when rejectExtraFields = true)
array length mismatch: expected 3, got 2 at: .items  (when strictArrayLength = true)

Appendix C: Configuration Quick Reference

Option Type Default Description
fieldNameMapper NameMapper Identity Field name transformation
caseNameMapper NameMapper Identity Case name transformation
discriminatorKind DiscriminatorKind Key ADT encoding strategy
arrayFormat ArrayFormat Auto Array encoding preference
delimiter Char , Array value separator
rejectExtraFields Boolean false Fail on unknown fields
enumValuesAsStrings Boolean true Case objects as strings
transientNone Boolean true Omit None values
requireOptionFields Boolean false Require Option fields
transientEmptyCollection Boolean true Omit empty collections
requireCollectionFields Boolean false Require collections
transientDefaultValue Boolean true Omit default values
requireDefaultValueFields Boolean false Require default fields
enableKeyFolding Boolean false Dotted key expansion

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions