Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enrich mutable API for vectors #334

Open
11 of 17 tasks
Shimuuar opened this issue Oct 18, 2020 · 20 comments
Open
11 of 17 tasks

Enrich mutable API for vectors #334

Shimuuar opened this issue Oct 18, 2020 · 20 comments

Comments

@Shimuuar
Copy link
Contributor

Shimuuar commented Oct 18, 2020

There were quite a few proposal to enrich API of mutable vectors: #326, #203, #64, #175. Tris issue tries to collect all proposals from them in one nice list (and adds few on top of them)

Folds & lookup

  • foldlM :: (b -> a -> m b) -> b -> v s a -> m b
  • foldrM :: (a -> b -> m b) -> b -> v s a -> m b
  • mapM_ :: (a -> m ()) -> v s a -> m () / forM_
  • imapM_ :: (Int -> a -> m ()) -> v s a -> m () / forM_
  • ifoldlM / ifoldrM
  • elemIndex

I'm not sure that foldlM_ & friends are worth adding. It's easy to ignore return result of fold anyway: void, _ <- .... It however could be useful to add pure variants:

  • foldl :: (b -> a -> b) -> b -> v s a -> m b
  • foldr :: (a -> b -> b) -> b -> v s a -> m b

They come with performance bonus. We can perform entire fold in the ST monad and only lift to m at last moment. This way GHC only have to use bind from ST monad and it's good at optimizing it

In place mutation

Naming could be slightly confusing here. Should in place map be called mapM or mapM_. Underscore usually means throwing away results of function and mapM builds new structure. I'm slightly lean towards mapM since it's in-place map

  • mapM :: (a -> m a) -> v s a -> m () / forM
  • imapM :: (Int -> a -> m a) -> v s a -> m () / iforM
  • map :: (a -> a) -> v s a -> m () / for
  • imap :: (Int -> a -> a) -> v s a -> m () / ifor
  • modifyM :: v s a -> (a -> m a) -> Int -> m ()
  • previousPermutation :: (PrimMonad m, Ord e, MVector v e) => v (PrimState m) e -> m Bool (to complement nextPermutation)
  • in-place reverse

Creation

  • generate :: Int -> (Int -> a) -> m (v s a)
  • generateM :: Int -> (Int -> m a) -> m (v s a)

Comments? Objections? Proposals?

@Bodigrim
Copy link
Contributor

Looks reasonable, but I would strongly prefer names which at least do not clash with immutable vector functions. Also it is quite unexpected to encounter Foo.map with a signature (a -> a) -> foo -> m ().

In bitvec I use names mapInPlace, zipInPlace, invertInPlace, reverseInPlace, etc.

CC @haskell-core (not sure if I can ping an organization)

@Shimuuar
Copy link
Contributor Author

RE: name clashes. Only way to avoid them is to add prefixes for all mutable versions. Also there's precedent for name clashes: init, tail, drop, replicate etc clash already.

RE: InPlace I like it. It makes clear that vector is modified in place. It also allows to have functions like mapM :: (a -> m b) -> v s a -> m (v s b) which create new vector without name clashes. On other hand it's a bit too long — 7 letters.

@lehins
Copy link
Contributor

lehins commented Oct 19, 2020

I am 💯 in favor of everything in this issue. The only thing I guess we need to reach consensus on is the naming.

I agree InPlace suffix portrays the intention well, and I also agree that it is a bit long. Of course it's ok if we can't come up with something better, but we can try. This in place mutation concept will be shared by many function so maybe it will be enough to add an acronym or a shortened suffix to such functions (a la. M for monadic) and it will be clear enough what it means. For example Mut for mutated? mapMutM, foldlMutM. Or Mod for modified mapModM ...

@Shimuuar
Copy link
Contributor Author

I'd like to propose following naming conventions.

  1. Functions that don't change vector's elements (like folds) or functions that create new vector (replicate/generate) get same name as pure counterpart.

  2. Functions that modify vector in place get some uniform suffix/prefix/infix. Proposals so far"

  • InPlace suffix: mapInPlace, mapMInPlace
  • Mut mixifx: mapMut, mapMutM
  • Same but Mod

I think that Mut is clearly better that Mod since library already uses mutable in API: type family Mutable etc. I prefer Mut over InPlace as well since it's shorter.

@Bodigrim
Copy link
Contributor

Bodigrim commented Oct 19, 2020

I do not see much benefit in brevity, certainly not enough to sacrifice clarity, to be honest. Mut (even if not mistaken for "mute" or "mutual") does not actually say much about semantics, besides that it somehow deals with mutable vectors or, possibly, mutates or changes something. For instance, nothing in the name mapMut hints at an in-place mutation.

@lehins
Copy link
Contributor

lehins commented Oct 19, 2020

I'd agree with you if InPlace suffix would be used in one or two functions, but we are talking about adding quite a few in five different modules, which means a even single letter would suffice to distinguish the concept. For example you know in that M in mapM stands for Monad, we don't name mapMonadic, right?

That being said, this is not something I care too deeply about, so whatever you guys decide on I'll be fine with. I just prefer shorter names as well thus wanted expressed my opinion on this subject. Most importantly we all agree on the concepts: folds / in place / creation 👍

@Bodigrim
Copy link
Contributor

I think we can absolutely go ahead with generate / generateM. They are simple, extremely useful and naming is similar to existing replicate / replicateM.

@cartazio
Copy link
Contributor

Inplace Suffix is probably the least confusing

@leftaroundabout
Copy link
Contributor

I feel strongly that something called map or mapM should not perform any in-place updates. As I commented in #326, I would suggest

modifyM :: MVector (PrimState m) a -> (a -> m a) -> Int -> m ()
modifyAll :: MVector (PrimState m) a -> (a -> a) -> m ()
modifyAllM :: MVector (PrimState m) a -> (a -> m a) -> m ()

but an InPlace suffix also seems reasonable.

For the suggested functions that don't perform mutation – well, I'm not completely convinced these are necessary at all if they always just boil to some pure operation over a freezed MVector. Is there a performance reason we'd want these directly for mutable vectors?

Name clashes I personally don't really mind, since I always import the immutable and mutable modules with different qualifiers, but how universal is this style? I could imagine quite a few packages would break if V.mapM becomes ambiguous.

If we're going to have clashes, then once again: please no different semantics for same-name functions!

@cartazio
Copy link
Contributor

cartazio commented Oct 25, 2020 via email

@lehins
Copy link
Contributor

lehins commented Oct 25, 2020

I thought about this one as well:

I feel strongly that something called map or mapM should not perform any in-place updates.

... just looked at the linked ticket and that was exactly where I thought about it 😄

Not sure modify by itself is the right name though, but modifyAll and modifyAllM sound reasonable and correlates nicely with modify function that deals with a single element..

For the suggested functions that don't perform mutation – well, I'm not completely convinced these are necessary at all ...

If you use unsafeFreeze, then they are indeed not necessary, but the problem is we do not want users to rely on unsafe functions. Let me outline the actual problems with relying on freezing into pure vectors with examples.

This is what we want:

mutableVectorOps mvec = do
  MV.write mvec 0 "foo"
  MV.mapM_ putStrLn mvec
  MV.write mvec 1 "bar"
  MV.mapM_ putStrLn mvec

Above function is something we would like the user be able to write and not worry about issues that come along with conversion to immutable vectors:

  • redundant copies:
mutableVectorOps mvec = do
  MV.write mvec 0 "foo"
  vec1 <- V.freeze mvec
  V.mapM_ putStrLn vec1
  write mvec 1 "bar"
  vec2 <- V.freeze mvec -- second copy of the same vector, ouch
  V.mapM_ putStrLn vec2
  • unsafety of operations when no-copy approach is used
mutableVectorOps mvec = do
  MV.write mvec 0 "foo"
  vec <- V.unsafeFreeze mvec
  V.mapM_ putStrLn vec
  write mvec 1 "bar"
  V.mapM_ putStrLn vec -- immutable vector was mutated, ouch

@cartazio
Copy link
Contributor

cartazio commented Oct 26, 2020 via email

@Shimuuar
Copy link
Contributor Author

I've added PR with what I think uncontroversial additions. Now there's question whether right folds should be included. We can't use laziness to abort computation early and have to read all elements anyway

@konsumlamm
Copy link
Contributor

I suggest you pin this issue instead of #203, which was closed in favor of this.

@lehins lehins pinned this issue Nov 15, 2020
@lehins
Copy link
Contributor

lehins commented Nov 15, 2020

@konsumlamm Thanks, done.

@konsumlamm
Copy link
Contributor

I just had a look at the different ways to construct mutable and immutable vectors and i noticed that mutable vectors have very few. It is proposed to add generate/generateM (which i have no problem with), but there are still a lot of other construction functions one could add (iterateN/iterateNM, unfoldr/unfoldrM, unfoldrN/unfoldrNM, enumFromN, enumFromStepN, ...).

We could just port these as well, though that would mean writing a lot of (possibly unnecessary) code. It should be possible to implement all the mutable variants with unsafeThaw (I'm not sure if that would be less efficient than implementing them manually, since the immutable versions all do streaming). However that probably shouldn't be encouraged, since it is (as the name says) unsafe, if used wrongly.

Another thought I had was implementing a safe O(1) thaw that uses linear types to prevent the user from using the immutable vector afterwards. Although that may be undesirable, since it requires a new language extension.

@Bodigrim
Copy link
Contributor

Not only it requires a new language extension, but it also makes this API available to GHC 9.0+ users only. I think it is too early to adopt linear types in vector.

@konsumlamm
Copy link
Contributor

I noticed that there already is an in-place reverse in Data.Vector.Generic.Mutable, but it is marked as an internal operation. I don't see a reason why this shouldn't be a normal operation and be reexported in the specialized modules.

@Boarders
Copy link
Contributor

What is the status of this work? Is there anything I can do to help it along? I am constantly frustrated when working with mutable vectors that I don't have map or traverse or similar functions and need to write basic recursions by hand. Would be great to see some movement here.

@lehins
Copy link
Contributor

lehins commented Jul 17, 2021

@Boarders, some of the functions have already been implemented in #338 , so there is no need to resort to manual loops. For example, generateM is already available, thus traverse f mv = generateM (length mv) (unsafeRead mv >=> f)

That being said, PRs are welcome 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants