Skip to content

stevekinney/canonize

Repository files navigation

canonize

Aggressive type coercion for Zod schemas.

Canonize exists for the messy middle ground between "invalid" and "usable." When working with LLM tool calls, you might get a parameters object that does not match your schema exactly, but it is close enough to work. Canonize takes any Zod schema and returns a version that tries its hardest to coerce incoming input into the shape you expect.

The Problem

You've defined a beautiful Zod schema:

const userSchema = z.object({
  age: z.number(),
  active: z.boolean(),
  tags: z.array(z.string()),
});

Then reality hits. Your API receives:

{ age: "30", active: "yes", tags: "admin,user" }

Zod's built-in z.coerce helps with simple cases, but it won't parse "yes" as true, split "admin,user" into an array, or handle the dozen other formats your data might arrive in.

You're left writing preprocessing logic, custom transforms, or wrapper functions for every schema. The business logic gets buried under input normalization.

The Solution

Wrap your schema with canonize() and move on:

import { canonize } from 'canonize';

const userSchema = canonize(
  z.object({
    age: z.number(),
    active: z.boolean(),
    tags: z.array(z.string()),
  }),
);

// All of these now work:
userSchema.parse({ age: '30', active: 'yes', tags: 'admin,user' });
userSchema.parse({ age: 30.5, active: 1, tags: ['admin'] });
userSchema.parse({ age: '30px', active: 'enabled', tags: '["admin"]' });

Canonize handles the messy real-world inputs so your schema can focus on validation:

  • "30", "30px", "30.5"30 (number)
  • "yes", "true", "on", "1", 1true (boolean)
  • "admin,user", '["admin","user"]'["admin", "user"] (array)
  • "2024-01-15", 1705276800000, "now"Date object
  • Nested objects, unions, discriminated unions, intersections—all coerced recursively

When to Use Canonize

  • API endpoints receiving form data, query strings, or JSON from unknown clients
  • Configuration files where users write enabled: yes instead of enabled: true
  • LLM tool calls where the model outputs "42" instead of 42
  • Legacy system integration with inconsistent data formats
  • CSV/spreadsheet imports where everything is a string

Installation

npm install canonize zod
# or
bun add canonize zod
# or
pnpm add canonize zod

Quick Start

import { canonize } from 'canonize';
import { z } from 'zod';

// Wrap any Zod schema
const schema = canonize(
  z.object({
    name: z.string(),
    age: z.number(),
    active: z.boolean(),
    tags: z.array(z.string()),
  }),
);

// Coercion happens automatically
schema.parse({ name: 123, age: '30', active: 'yes', tags: 'a,b,c' });
// { name: '123', age: 30, active: true, tags: ['a', 'b', 'c'] }

Armorer Integration

Canonize works cleanly with Armorer tool schemas. Wrap the schema (or raw shape) with canonize so LLM arguments are coerced before execution.

import { createTool } from 'armorer';
import { canonize } from 'canonize';
import { z } from 'zod';

const addNumbers = createTool({
  name: 'add-numbers',
  description: 'Add two numbers together',
  schema: canonize({
    a: z.number(),
    b: z.number(),
  }),
  async execute({ a, b }) {
    return a + b;
  },
});

API Reference

Core Function

canonize<T>(schema: T): T

Wraps a Zod schema with aggressive type coercion. Returns the same schema type for full TypeScript inference.

import { canonize } from 'canonize';
import { z } from 'zod';

const numberSchema = canonize(z.number());
numberSchema.parse('42'); // 42
numberSchema.parse('42px'); // 42
numberSchema.parse('1,234'); // 1234

const boolSchema = canonize(z.boolean());
boolSchema.parse('yes'); // true
boolSchema.parse('0'); // false
boolSchema.parse('enabled'); // true

const arraySchema = canonize(z.array(z.number()));
arraySchema.parse('1,2,3'); // [1, 2, 3]
arraySchema.parse('[1,2,3]'); // [1, 2, 3]
arraySchema.parse(42); // [42]

Supported Zod types:

  • Primitives: string, number, boolean, bigint, date, null, nan
  • Collections: array, object, tuple, record, map, set
  • Composites: union, discriminatedUnion, intersection
  • Special: enum, literal, any, unknown, custom
  • Wrappers: optional, nullable, default, catch, readonly, lazy

Diagnostics

safeParseWithReport(schema, input)

Coerces inputs and returns a report of what changed alongside the parse result.

import { safeParseWithReport } from 'canonize';
import { z } from 'zod';

const schema = z.object({ count: z.number(), enabled: z.boolean() });
const result = safeParseWithReport(schema, { count: '42', enabled: 'yes' });

if (result.success) {
  console.log(result.data);
}
console.log(result.report.warnings);

coerceWithReport(schema, input)

Returns the coerced value plus warnings without running validation.

createRepairHints(error, options?)

Generates compact, LLM-friendly suggestions from a ZodError.

import { createRepairHints } from 'canonize';

const hints = createRepairHints(result.error);

Type Detection Utilities

getZodTypeName(schema: ZodTypeAny): string

Returns the Zod type name for a schema. Useful for building custom coercion logic.

import { getZodTypeName } from 'canonize';
import { z } from 'zod';

getZodTypeName(z.string()); // 'string'
getZodTypeName(z.array(z.number())); // 'array'
getZodTypeName(z.object({})); // 'object'
getZodTypeName(z.string().optional()); // 'optional'

unwrapSchema(schema: ZodTypeAny): ZodTypeAny

Removes wrapper types (optional, nullable, default, catch, readonly) to get the inner schema.

import { unwrapSchema, getZodTypeName } from 'canonize';
import { z } from 'zod';

const wrapped = z.string().optional().nullable().default('hello');
const inner = unwrapSchema(wrapped);
getZodTypeName(inner); // 'string'

Circular Reference Tracking

CircularTracker

A WeakSet-based tracker for detecting circular references during coercion. Prevents infinite loops when processing self-referential data structures.

import { CircularTracker } from 'canonize';

const tracker = new CircularTracker();
const obj = { self: null };
obj.self = obj; // circular reference

tracker.has(obj); // false
tracker.add(obj);
tracker.has(obj); // true

Schema Creation Helpers

createCanonizePrimitive(primitive: CanonizePrimitive): ZodTypeAny

Creates a coerced Zod schema for a primitive type.

import { createCanonizePrimitive } from 'canonize';

const stringSchema = createCanonizePrimitive('string');
const numberSchema = createCanonizePrimitive('number');
const booleanSchema = createCanonizePrimitive('boolean');
const nullSchema = createCanonizePrimitive('null');

Supported primitives: 'string' | 'number' | 'boolean' | 'null'

createCanonizeSchema<T>(schema: T): ZodObject

Creates a Zod object schema from a record of primitive type names.

import { createCanonizeSchema } from 'canonize';

const schema = createCanonizeSchema({
  name: 'string',
  age: 'number',
  active: 'boolean',
});

schema.parse({ name: 123, age: '30', active: 'yes' });
// { name: '123', age: 30, active: true }

Constants

ZodType

Object containing Zod type name constants for use in type detection.

import { ZodType } from 'canonize';

ZodType.STRING; // 'string'
ZodType.NUMBER; // 'number'
ZodType.ARRAY; // 'array'
ZodType.OBJECT; // 'object'
ZodType.UNION; // 'union'
// ... and more

Available constants:

Category Constants
Primitives STRING, NUMBER, BOOLEAN, DATE, BIGINT, NULL, UNDEFINED, NAN
Collections ARRAY, OBJECT, TUPLE, RECORD, MAP, SET
Composites UNION, DISCRIMINATED_UNION, INTERSECTION
Enums ENUM, NATIVE_ENUM, LITERAL
Wrappers OPTIONAL, NULLABLE, DEFAULT, CATCH, LAZY, READONLY, BRANDED
Special ANY, UNKNOWN, NEVER, CUSTOM

Types

CanonizeSchema<T>

Type alias representing a canonized schema. Preserves the original schema's type information.

import type { CanonizeSchema } from 'canonize';
import { z } from 'zod';

type MySchema = CanonizeSchema<z.ZodObject<{ name: z.ZodString }>>;

CanonizePrimitive

Union type for primitive type names accepted by createCanonizePrimitive.

import type { CanonizePrimitive } from 'canonize';

const primitive: CanonizePrimitive = 'string'; // 'string' | 'number' | 'boolean' | 'null'

Coercion Rules

String

Input Output
"hello" "hello"
123 "123"
true "true"
null, undefined ""
[1, 2, 3] "1, 2, 3"
{ key: "value" } "key: value"
new Date() ISO string

Number

Input Output
"42" 42
"42px", "42em" 42
"1,234", "1_234" 1234
"1e5" 100000
true / false 1 / 0
[42] 42

Boolean

Input Output
"true", "yes", "on", "y", "t", "enabled", "1" true
"false", "no", "off", "n", "f", "disabled", "0" false
1, non-zero numbers true
0 false

Date

Input Output
ISO string new Date(string)
Unix timestamp (ms) new Date(number)
"now" Current time
"today" Start of today
"yesterday" Start of yesterday
"tomorrow" Start of tomorrow

Array

Input Output
"1,2,3" ["1", "2", "3"]
"[1,2,3]" (JSON) [1, 2, 3]
null, "" []
Set, Map Array from values
Single value [value]

Object

Input Output
JSON string Parsed object
Map Object.fromEntries()
null, undefined {}

Union

Coercion tries options in order:

  1. Exact primitive match (preserves numbers in string | number)
  2. Object/record schemas for plain objects
  3. Array schemas for arrays and CSV strings
  4. Boolean schemas for boolean-like strings
  5. First union member, then remaining members

Discriminated Union

Uses the discriminator field to select the variant, then coerces fields:

const schema = canonize(
  z.discriminatedUnion('type', [
    z.object({ type: z.literal('a'), value: z.number() }),
    z.object({ type: z.literal('b'), value: z.string() }),
  ]),
);

schema.parse({ type: 'a', value: '42' }); // { type: 'a', value: 42 }

Tool Parameter Helpers

The canonize/tool-parameters module provides schema builders for LLM tool definitions. These handle malformed AI outputs gracefully with sensible defaults.

import {
  boolean,
  number,
  string,
  selector,
  containerSelector,
  collection,
  numbers,
  choices,
  count,
  url,
  exportFormat,
  imageFormat,
  links,
  linkMetadataSchema,
  type LinkMetadata,
} from 'canonize/tool-parameters';

boolean(defaultValue)

const enabled = boolean(true);
enabled.parse('yes'); // true
enabled.parse('FALSE'); // false
enabled.parse(1); // true
enabled.parse(undefined); // true (default)

number(defaultValue, options?)

const count = number(10, { min: 1, max: 100, int: true });
count.parse('42px'); // 42
count.parse('1,234'); // 1234
count.parse(undefined); // 10 (default)

string()

const name = string();
name.parse('  hello  '); // 'hello' (trimmed)
name.parse(123); // '123'

selector()

CSS selector string, trimmed and validated non-empty.

const sel = selector();
sel.parse('  .class  '); // '.class'

containerSelector()

Container selector with intelligent coercion for common LLM mistakes:

const container = containerSelector();
container.parse('main'); // 'main'
container.parse('*'); // null (wildcard → entire document)
container.parse('null'); // null
container.parse('a'); // null (link selector → entire document)
container.parse('body a'); // 'body' (extracts container)
container.parse('all'); // null (natural language)

collection(...defaultValues)

String array with flexible separators (comma, semicolon, pipe, newline):

const tags = collection('default');
tags.parse('foo,bar'); // ['foo', 'bar']
tags.parse('foo;bar'); // ['foo', 'bar']
tags.parse('foo|bar'); // ['foo', 'bar']
tags.parse('foo\nbar'); // ['foo', 'bar']
tags.parse(undefined); // ['default']

numbers(options?)

Number array with flexible input handling:

const ids = numbers({ int: true, min: 0 });
ids.parse('1,2,3'); // [1, 2, 3]
ids.parse([1, '2', 3]); // [1, 2, 3]
ids.parse(undefined); // []

choices(values, defaultValue?)

Enum with fuzzy matching (case-insensitive, prefix, contains):

const sort = choices(['date', 'name', 'size'], 'date');
sort.parse('Date'); // 'date' (case-insensitive)
sort.parse('nam'); // 'name' (prefix match)
sort.parse('date_desc'); // 'date' (contains match)

count()

Number for count/statistic values (defaults to 0):

const total = count();
total.parse('42'); // 42
total.parse(null); // 0

url()

URL string with cleanup (removes wrapping quotes, brackets):

const link = url();
link.parse('"https://example.com"'); // 'https://example.com'
link.parse('<https://example.com>'); // 'https://example.com'

exportFormat(options?)

Export format enum (markdown, csv, json):

exportFormat(); // defaults to 'markdown'
exportFormat({ defaultValue: 'csv' }); // defaults to 'csv'
exportFormat({ includeJson: false }); // 'markdown' | 'csv' only

imageFormat(defaultValue?)

Image format enum (jpeg, png):

imageFormat(); // defaults to 'png'
imageFormat('jpeg'); // defaults to 'jpeg'

links() and linkMetadataSchema

Array of link metadata objects:

const linkList = links();
linkList.parse([{ title: 'Example', url: 'https://example.com' }]);

// Or use the schema directly
import { linkMetadataSchema, type LinkMetadata } from 'canonize/tool-parameters';

const link: LinkMetadata = {
  title: 'Example',
  url: 'https://example.com',
  source: 'html', // optional: 'html' | 'markdown' | 'element' | 'link'
  rel: 'noopener', // optional
  target: '_blank', // optional
  referrerPolicy: null, // optional
  text: 'Click here', // optional: raw link text
};

Advanced Usage

Lazy Schemas (Recursive Types)

const TreeNode = canonize(
  z.lazy(() =>
    z.object({
      value: z.number(),
      children: z.array(TreeNode).optional(),
    }),
  ),
);

TreeNode.parse({
  value: '1',
  children: [{ value: '2' }, { value: '3' }],
});

Intersection Types

const schema = canonize(
  z.intersection(z.object({ a: z.number() }), z.object({ b: z.string() })),
);

schema.parse({ a: '1', b: 2 }); // { a: 1, b: '2' }

Map and Set

const mapSchema = canonize(z.map(z.string(), z.number()));
mapSchema.parse([
  ['a', '1'],
  ['b', '2'],
]); // Map { 'a' => 1, 'b' => 2 }
mapSchema.parse({ a: '1', b: '2' }); // Map { 'a' => 1, 'b' => 2 }

const setSchema = canonize(z.set(z.number()));
setSchema.parse([1, '2', 3]); // Set { 1, 2, 3 }
setSchema.parse('1,2,3'); // Set { 1, 2, 3 }

Error Handling

Coercion errors are caught internally—the original value passes through to Zod for validation:

const schema = canonize(z.number());

schema.parse('42'); // 42 (coercion succeeds)
schema.parse('not a number'); // throws ZodError (coercion fails, Zod validates original)

Circular references throw immediately:

const obj = { self: null };
obj.self = obj;

const schema = canonize(z.object({ self: z.any() }));
schema.parse(obj); // throws Error: Circular reference detected

StandardSchema Compatibility

Canonize is fully compatible with StandardSchema, the interoperability spec implemented by Zod, Valibot, ArkType, and others.

Since Zod v4 implements StandardSchema, all canonized schemas have the ~standard property:

const schema = canonize(z.object({ count: z.number() }));

// Use with any StandardSchema-aware tool
const result = await schema['~standard'].validate({ count: '42' });
// { value: { count: 42 } }

Canonize is Zod-specific because intelligent coercion requires schema introspection (knowing field types). StandardSchema only provides a validate() function without type information.


License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published