Skip to content

Latest commit

 

History

History
26 lines (20 loc) · 2.63 KB

README.md

File metadata and controls

26 lines (20 loc) · 2.63 KB

This module is an optimized implementation of Ukkonen's suffix tree algorithm in python which will be having most of the important text processing functionalities such as:

Search for strings:

Check if a string P of length m is a substring in O(m) time.
Find the first occurrence of the patterns P1,... ,Pq of total length m as substrings in O(m) time.

Find all z occurrences of the patterns P1,... ,Pq of total length m as substrings in O(m+z) time.

  • Search for a regular expression P in time expected sublinear in n
  • Find for each suffix of a pattern P the length of the longest match between a prefix of P[i... m] and a substring in D in image time. This is termed the matching statistics for P

Find properties of the strings:

  • Find the longest common substrings of the string Si and Sj in image time.
  • Find all maximal pairs, maximal repeats or supermaximal repeats in image time.
  • Find the Lempel–Ziv decomposition in image time.[10]
  • Find the longest repeated substrings in image time.
  • Find the most frequently occurring substrings of a minimum length in image time.
  • Find the shortest strings from image that do not occur in D in O(n+z) time, if there are z such strings.
  • Find the shortest substrings occurring only once in image time.
  • Find, for each i the shortest substrings of Si not occurring elsewhere in D in image time.

sources: