Skip to content

xc-jp/safe-gen

Repository files navigation

safe-gen

safe-gen on hackage safe-gen on Stackage Nightly

Writing Arbitrary instances is for recursive data is annoying, since it's easy to accidentally write non-terminating generators. A common pattern is to use Gen's implicit size parameter to guide recursion, but this is tedious and error-prone. safe-gen automates this process, making recursive instances painless. safe-gen can be used both to manually write instances or to automatically derive them.

Example

Here's an example of an Arbitrary instance that runs in unbounded time:

data Trie a
  = Leaf a
  | Branch (Trie a) (Trie a) (Trie a)

instance Arbitrary a => Arbitrary (Trie a) where
  arbitrary = oneof
    [ Leaf <$> arbitrary
    , Branch <$> arbitrary <*> arbitrary <*> arbitrary
    ]

And here's how to write it manually using safe-gen:

instance Arbitrary a => Arbitrary (Trie a) where
  arbitrary = runSafeGen go
    where go = Safe.oneof
                 [ Leaf <$> gen arbitrary,
                   Branch <$> go <*> go <*> go
                 ]

Or, derive it automatically:

data Trie a
  = Leaf a
  | Branch (Trie a) (Trie a) (Trie a)
  deriving stock Generic
  deriving anyclass SafeArbitrary
  deriving Arbitrary via FromSafeArbitrary (Trie a)

Usage

safe-gen has a very small API. The two code samples above show it off almost all there is to it.

To manually build SafeGens, use gen :: Gen a -> SafeGen a to construct a single value, compose product types using the Applicative interface, and sum types using frequency or oneof. To run, use runSafeGen :: SafeGen a -> Gen a. arb is a convenient synonym for gen arbitrary.

To derive Arbitrary instances, you need to first derive a SafeArbitrary instance, typically as an empty instance declaration. The FromSafeArbitrary wrapper then provides a convenient way to derive an Arbitrary instance from the SafeArbitrary instance.

Design

As mentioned in the intro, a common pattern is to use Gen's implicit size parameter to guide recursion, but this is tedious and error-prone. safe-gen automates this by

  1. automatically dividing the size parameter between the branches of a product type, and
  2. only going down sufficiently shallow branches of a sum type.

This guarantees that if your generator can terminate in finite time, safe-gen makes sure that it does.

The inverse is not true; (despite the name) safe-gen does not make it impossible to write bottom/unbounded/diverging Gen values. For example, nothing prevents you from writing something like runSafeGen (let go = go in go), which will cause an infinite loop. In some cases, such as runSafeGen (let f = oneof [f] in f), safe-gen will be able to catch the infinite recursion and throw an exception, and in some cases it will give you a lazy infinite generator, but in general, (and unlike generic-arbitrary, about which more below), these issues are left undefined and outside the scope of the library.

Comparison with other packages

generic-arbitrary

generic-arbitrary tries to completely hide the process of writing instances by providing a nice, fully automated experience. You usually add just a deriving clause, and if there is no terminating instance for your data type, you often get a helpful compile-time warning. Unfortunately, it's pretty easy to push generic-arbitrary to its limit, where it will still generate non-terminating instances, without emitting a warning. With safe-gen, derived instances always terminate.

generic-random

generic-random also focuses on deriving instances, but provides (and requires) a bit more control than generic-arbitrary. Its DSL pushes more machinery to compile time than safe-gen, but is also less flexible, and makes it easier to shoot yourself in the foot.

less-arbitrary

less-arbitrary attempts to address some of generic-arbitrary's shortcomings by adding a heuristic based retry-mechanism for generators that seem to loop. It has more tools for hand-writing generators, and so expects more from the user. My main criticism is that, compared to safe-gen, its API is relatively large and complicated, and replaces the problem of infinitely looping generators with generators that infinitely retry.

Implementation details

A SafeGen value is a lazy, potentially infinite n-ary tree, whose branches are product or sum types, and leaves are Gen values.

For every node in the tree, we define a shallowness value, or minimum depth, which is the smallest number of child nodes to traverse to find a leaf. More precisely, for sum types the shallowness is one plus the shallowness of its shallowest child, for product types it is one plus the shallowness of its least shallow child, and for leaves it is zero. Since trees are potentially infinite, we take care to calculate this value lazily as a Peano number.

When we "run" this tree, i.e. go from SafeGen to Gen, we start by asking for Gen's size parameter. Then, with this size parameter in hand, we traverse the tree as follows: For sum branches, we only consider children whose shallowness is at most the size parameter, or if none exist, the shallowest branch(es). For product branches, we divide the size between the number of children, and recurse.

About

Recursive Arbitrary instances without headaches

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published