Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different reifyGraph behavior between data-reify-0.6.1 and 0.6.2 #11

Closed
RyanGlScott opened this issue Oct 2, 2020 · 3 comments
Closed

Comments

@RyanGlScott
Copy link
Member

RyanGlScott commented Oct 2, 2020

#8 appears to have changed the user-visible behavior of the reifyGraph function. Here is a minimal example that demonstrates the difference in behavior between 0.6.1 and 0.6.2:

{-# LANGUAGE TypeFamilies #-}
module Main (main) where

import Data.Reify

data State = State Char [State]
  deriving Show

s1, s2, s3 :: State
s1 = State 'a' [s2,s3]
s2 = State 'b' [s1,s2]
s3 = State 'c' [s2,s1]

data StateDeRef r = StateDeRef Char [r]
  deriving Show

instance MuRef State where
  type DeRef State = StateDeRef
  mapDeRef f (State a tr) = StateDeRef a <$> traverse f tr

main :: IO ()
main = reifyGraph s1 >>= print

With data-reify-0.6.1, this outputs:

let [(1,StateDeRef 'a' [2,3]),(3,StateDeRef 'c' [2,1]),(2,StateDeRef 'b' [1,2])] in 1

But with data-reify-0.6.2, this outputs:

let [(1,StateDeRef 'a' [2,3]),(3,StateDeRef 'c' [2,1]),(2,StateDeRef 'b' [1,2]),(2,StateDeRef 'b' [1,2])] in 1

Among other things, this breaks ekmett/ad#89.

@RyanGlScott
Copy link
Member Author

One possible way to remedy this is to turn the IntSet argument to findNodes into an MVar IntSet. That is, apply the following patch:

diff --git a/Data/Reify.hs b/Data/Reify.hs
index c3c44ec..9b8bdbc 100644
--- a/Data/Reify.hs
+++ b/Data/Reify.hs
@@ -65,7 +65,8 @@ reifyWithContext :: (MuRef s)
           -> s
           -> IO (Graph (DeRef s))
 reifyWithContext rt1 rt2 uVar j = do
-  root <- findNodes rt1 rt2 uVar S.empty j
+  nodeSetVar <- newMVar S.empty
+  root <- findNodes rt1 rt2 uVar nodeSetVar j
   pairs <- readMVar rt2
   return (Graph pairs root)

@@ -73,24 +74,28 @@ findNodes :: (MuRef s)
           => MVar (HashMap DynStableName Int)
           -> MVar [(Int,DeRef s Int)]
           -> MVar Int
-          -> S.IntSet
+          -> MVar S.IntSet
           -> s
           -> IO Int
-findNodes rt1 rt2 uVar nodeSet !j = do
+findNodes rt1 rt2 uVar nodeSetVar !j = do
         st <- makeDynStableName j
         tab <- takeMVar rt1
+        nodeSet <- takeMVar nodeSetVar
         case M.lookup st tab of
           Just var -> do putMVar rt1 tab
                          if var `S.member` nodeSet
-                           then return var
-                           else do res <- mapDeRef (findNodes rt1 rt2 uVar (S.insert var nodeSet)) j
+                           then do putMVar nodeSetVar nodeSet
+                                   return var
+                           else do putMVar nodeSetVar $ S.insert var nodeSet
+                                   res <- mapDeRef (findNodes rt1 rt2 uVar nodeSetVar) j
                                    tab' <- takeMVar rt2
                                    putMVar rt2 $ (var,res) : tab'
                                    return var
           Nothing ->
                     do var <- newUnique uVar
                        putMVar rt1 $ M.insert st var tab
-                       res <- mapDeRef (findNodes rt1 rt2 uVar (S.insert var nodeSet)) j
+                       putMVar nodeSetVar $ S.insert var nodeSet
+                       res <- mapDeRef (findNodes rt1 rt2 uVar nodeSetVar) j
                        tab' <- takeMVar rt2
                        putMVar rt2 $ (var,res) : tab'
                        return var

This restores the original behavior of the program in #11 (comment), and it makes the behavior of each of the existing test cases coincide between data-reify-0.6.1 and data-reify-0.6.2.

@RyanGlScott
Copy link
Member Author

As for why this is the correct thing to do, let's first examine the old type signature for findNodes:

findNodes :: (MuRef s)
          => MVar (HashMap DynStableName Int)
          -> MVar [(Int,DeRef s Int)]
          -> MVar Int
          -> s
          -> IO Int

There are three MVar arguments:

  1. The MVar (HashMap DynStableName Int) argument keeps track of all the unique numbers encountered so far, mapping from Haskell stable names to their corresponding unique numbers in the Graph that is being built.
  2. The MVar [(Int,DeRef s Int)] accumulates the key-value pairs in the Graph that is being built. The keys are unique numbers and the values are recursive occurrences of s.
  3. The MVar Int is a supply of unique numbers. Any time a new recursive occurrence of s is encountered, a fresh unique is taken from this supply, causing the supply to be incremented.

Critically, each of these arguments use MVars because each recursive invocation of findNodes can cause the state of the underlying values to change. For example, in this part of findNodes:

res <- mapDeRef (findNodes rt1 rt2 uVar) j

Suppose that j is a list here. This means that mapDeRef (findNodes rt1 rt2 uVar) will recursively invoke findNodes multiple times. We want to make sure that any state changes in the first recursive invocation propagate through to the second recursive invocation, and so on. The use MVars ensures that this happens¹.

One observation about the old type signature for findNodes is that the MVar (HashMap DynStableName Int) argument is actually serving two roles:

  • It keeps track of all of the unique numbers encountered so far, and
  • It maps from stable names used in the current program to their corresponding unique numbers.

Conceptually, however, there is no reason why we have to use the same MVar to do both things. Indeed, one could imagine separating these out into two MVar arguments. The map of stable names need not contain only the stable names encountered so far; one could conceivably run findNodes once to pre-populate the map and then reuse the same map in another run of findNodes (more on this point later).

I bring up the topic of splitting up this argument because separating the two roles of the MVar (HashMap DynStableName Int) is exactly what #8 does. Here is the new type signature for findNodes (including my patch in #11 (comment), plus some minor cosmetic changes):

findNodes :: (MuRef s)
          => MVar (HashMap DynStableName Int)
          -> MVar IntSet
          -> MVar [(Int,DeRef s Int)]
          -> MVar Int
          -> s
          -> IO Int

The first two MVars now play the role of the first argument in the old version of findNodes. The MVar (HashMap DynStableName Int) maps from stable names to unique numbers, while the MVar IntSet keeps track of all the unique numbers encountered so far. The IntSet should always be a subset of the range of the HashMap DynStableName Int.

Earlier, I mentioned that we could pre-populate the map of stable names in one invocation of findNodes and reuse the same map in another invocation of findNodes. As it turns out, #8 does exactly this in the reifyGraphs function that it introduces:

-- | 'reifyGraphs' takes a 'Traversable' container 't s' of a data structure 's'
-- admitting 'MuRef', and returns a 't (Graph (DeRef s))' with the graph nodes
-- resolved within the same context.
--
-- This allows for, e.g., a list of mutually recursive structures.
reifyGraphs :: (MuRef s, Traversable t) => t s -> IO (t (Graph (DeRef s)))
reifyGraphs coll = do rt1 <- newMVar M.empty
                      uVar <- newMVar 0
                      flip traverse coll $ \m -> do
                        rt2 <- newMVar []
                        nodeSetVar <- newMVar S.empty
                        root <- findNodes rt1 nodeSetVar rt2 uVar m
                        pairs <- readMVar rt2
                        return (Graph pairs root)

Note how the first argument to findNodes (of type MVar (HashMap DynStableName Int)) is reused across all iterations of the traverse, but the second argument to findNodes (of type MVar IntSet) is created anew in each iteration of the traverse. The reuse of the HashMap is important to ensure that any stable names that are used in subsequent iterations of the traverse always resolve to the same unique numbers.

As a final note, observe that if the MVar IntSet were simply an IntSet, then it would be subject to the findNodes state-propagation issue mentioned above. The use of an MVar ensures that any state changes in the first recursive invocation of findNodes propagate through to the second recursive invocation, and so on.

I'll submit a PR which implements this fix soon, along with some sort of way to test for this in a regression suite.


¹ We could also imagine using StateT instead of MVars here, but either way, we need some way to propagate state changes.

RyanGlScott added a commit that referenced this issue Oct 9, 2020
This implements the fix for #11 described in
#11 (comment).
That is, it turns the `IntSet` argument to `findNodes` into an `MVar IntSet`.

Besides the bugfix itself, this commit refactors the internals of `Data.Reify`
slightly to make it slightly more comprehensible at a glance.
RyanGlScott added a commit that referenced this issue Oct 9, 2020
This implements the fix for #11 described in
#11 (comment).
That is, it turns the `IntSet` argument to `findNodes` into an `MVar IntSet`.

Besides the bugfix itself, this commit refactors the internals of `Data.Reify`
slightly to make it slightly more comprehensible at a glance.
@RyanGlScott
Copy link
Member Author

See #12.

RyanGlScott added a commit that referenced this issue Oct 12, 2020
This implements the fix for #11 described in
#11 (comment).
That is, it turns the `IntSet` argument to `findNodes` into an `MVar IntSet`.

Besides the bugfix itself, this commit refactors the internals of `Data.Reify`
slightly to make it slightly more comprehensible at a glance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant