-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text: add fun isSubsequenceOf #369
base: master
Are you sure you want to change the base?
Changes from all commits
813989c
c53b59f
173c1ae
f477ba6
2a8e3d9
865ada4
073d676
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -167,6 +167,7 @@ module Data.Text | |
, isPrefixOf | ||
, isSuffixOf | ||
, isInfixOf | ||
, isSubsequenceOf | ||
|
||
-- ** View patterns | ||
, stripPrefix | ||
|
@@ -253,6 +254,7 @@ import qualified Language.Haskell.TH.Lib as TH | |
import qualified Language.Haskell.TH.Syntax as TH | ||
import Text.Printf (PrintfArg, formatArg, formatString) | ||
import System.Posix.Types (CSsize(..)) | ||
import Data.Function (on) | ||
|
||
-- $setup | ||
-- >>> import Data.Text | ||
|
@@ -1881,6 +1883,43 @@ isInfixOf needle haystack | |
| otherwise = not . L.null . indices needle $ haystack | ||
{-# INLINE [1] isInfixOf #-} | ||
|
||
-- 2021-09-29: NOTE: | ||
-- * after the implementation - determine & mention the big O | ||
-- | The 'isSubsequenceOf' function takes the main text and the subsequnce | ||
-- to find and returns 'True' iff the second argument is a subsequence | ||
-- of the first. | ||
-- | ||
-- "Subsequence" used in the meaning of: characters of the second argument | ||
-- appear in same sequential order in the main data, to say second argument can | ||
-- be derived by deleting some (any) or no elements from the first. | ||
-- | ||
-- Examples: | ||
-- | ||
-- >>> isSubsequenceOf "1234567" "1356" | ||
-- True | ||
-- | ||
-- >>> isSubsequenceOf "1234567" "21" | ||
-- False | ||
-- | ||
-- `isSubsequenceOf` is the base case & implementation of fuzzy search. | ||
isSubsequenceOf :: Text -> Text -> Bool | ||
isSubsequenceOf tf sf | ||
| length sf > length tf = False | ||
| otherwise = subseqOf tf sf | ||
where | ||
subseqOf :: Text -> Text -> Bool | ||
subseqOf t s = | ||
on f uncons t s | ||
where | ||
f :: Maybe (Char, Text) -> Maybe (Char, Text) -> Bool | ||
f _ Nothing = True | ||
f Nothing _ = False | ||
f (Just (tc,ts)) (Just (sc,ss)) = | ||
subseqOf ts $ | ||
if tc == sc | ||
then s | ||
else ss | ||
Comment on lines
+1910
to
+1921
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a good reason not to just use isSubsequenceOf needles haystack = List.isSubsequenceOf (T.unpack needles) (T.unpack haystack) This code is doing exactly the same work, since |
||
|
||
------------------------------------------------------------------------------- | ||
-- * View patterns | ||
|
||
|
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -20,6 +20,8 @@ import qualified Data.Text.Internal.Lazy as TL (Text(..)) | |||||||||||||||||||||||
import qualified Data.Text.Internal.Lazy.Fusion as SL | ||||||||||||||||||||||||
import qualified Data.Text.Lazy as TL | ||||||||||||||||||||||||
import qualified Tests.SlowFunctions as Slow | ||||||||||||||||||||||||
import Control.Monad (replicateM) | ||||||||||||||||||||||||
import Data.List (nub, sort) | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
s_take n = L.take n `eqP` (unpackS . S.take n) | ||||||||||||||||||||||||
s_take_s (Small n) = L.take n `eqP` (unpackS . S.unstream . S.take n) | ||||||||||||||||||||||||
|
@@ -231,6 +233,47 @@ tl_isSuffixOf s = L.isSuffixOf s`eqP` TL.isSuffixOf (packS s) | |||||||||||||||||||||||
t_isInfixOf s = L.isInfixOf s `eqP` T.isInfixOf (packS s) | ||||||||||||||||||||||||
tl_isInfixOf s = L.isInfixOf s `eqP` TL.isInfixOf (packS s) | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
-- | Generator for substrings that keeps the element order. | ||||||||||||||||||||||||
-- Aka: "1234567890" -> "245680" | ||||||||||||||||||||||||
genOrdSubseq :: T.Text -> Gen T.Text | ||||||||||||||||||||||||
genOrdSubseq txt = | ||||||||||||||||||||||||
T.pack . transform <$> genTransformMap | ||||||||||||||||||||||||
Comment on lines
+239
to
+240
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can't understand what this implementation is actually doing, but is this not all we need?
Suggested change
We're just trying to select a random sample of elements from the given string in order right? Are there any other properties we need from this? Trying to ensure that the result is non-empty isn't too hard, you just generate two lists of Bools:
Suggested change
(not sure if I have the maths right, but I'm trying to make a list of |
||||||||||||||||||||||||
where | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
pickN :: Gen Int | ||||||||||||||||||||||||
pickN = | ||||||||||||||||||||||||
choose (0, T.length txt) | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
pickNs :: Gen [Int] | ||||||||||||||||||||||||
pickNs = | ||||||||||||||||||||||||
fmap (sort . nub) $ (`replicateM` pickN) =<< pickN | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
growInst :: [Bool] -> Int -> [Bool] | ||||||||||||||||||||||||
growInst ls n = | ||||||||||||||||||||||||
ls | ||||||||||||||||||||||||
<> take (length ls - pred n) [True ..] | ||||||||||||||||||||||||
<> [False] | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
mkTransformInst :: [Bool] -> [Int] -> [Bool] | ||||||||||||||||||||||||
mkTransformInst bls [] = | ||||||||||||||||||||||||
bls | ||||||||||||||||||||||||
<> take (T.length txt - length bls) [True ..] | ||||||||||||||||||||||||
mkTransformInst bls (i:is) = | ||||||||||||||||||||||||
mkTransformInst | ||||||||||||||||||||||||
(growInst bls i) | ||||||||||||||||||||||||
is | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
mkTransformMap :: [a] -> [Int] -> [(a, Bool)] | ||||||||||||||||||||||||
mkTransformMap ls ixs = | ||||||||||||||||||||||||
zip ls (mkTransformInst mempty ixs) | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
genTransformMap :: (Gen [(Char, Bool)]) | ||||||||||||||||||||||||
genTransformMap = fmap (mkTransformMap $ T.unpack txt) pickNs | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
transform :: [(Char, Bool)] -> [Char] | ||||||||||||||||||||||||
transform = | ||||||||||||||||||||||||
foldr (\ (c, b) as -> as <> if b then [c] else mempty) mempty | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
t_stripPrefix s = (fmap packS . L.stripPrefix s) `eqP` T.stripPrefix (packS s) | ||||||||||||||||||||||||
tl_stripPrefix s = (fmap packS . L.stripPrefix s) `eqP` TL.stripPrefix (packS s) | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't the arguments her backwards? Generally
isInfixOf
etc. are used infix:but this would be
which should be False, if you read it in English. It should be
isSubsequenceOf needles haystack
-"1234567"
is not a subsequence of"1356"
.