-
Notifications
You must be signed in to change notification settings - Fork 0
An effcient C suffix array implementation.
License
anadon/libsuffixarray
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
======================================================================== DESCRIPTION============================================================= ======================================================================== This is a C library to handle suffix array construction in a minimalistic, while still setting up all structures needed to replace any use of suffix trees. For more details, see USAGE. ======================================================================== NOTE ======================================================================== The implementation is now correct, but slow. While hack on the SAIS code the library will rely on the rather simple recursive bucket sort. The sort is prone to stack overflows on genomic size data. ======================================================================== BUILD AND INSTALL======================================================= ======================================================================== The standard build and install procedure is as follows: # make install For debugging problems related to the suffixArray structure and related code, verbose output is added when compiled using the following: $ make debug Do be aware however, that this does not install the libraries into system directories. This is because this build is intended to only be useful for those debugging issues related to this library. Such users are expected to be able to use, move, and link the libraries as is appropriate. ======================================================================== USAGE=================================================================== ======================================================================== To use the library, include it using the following: #include <suffixarray.h> To create a usable suffix array object/structure, declare the following: SuffixArray example; To initialize *example*, set it to the following: exampleOne = makeSuffixArray(sequence, sequenceLength); The above uses sequence as an unsigned char*, and sequenceLength as a size_t. The code is implemented such that it can expand to use all the memory in a system. NOTE: It does not implement oppertunistic compression of small numbers to conserve memory, and there are no plans to implement such a feature. To access a given suffix index, use the following: exampleOne.sa_data[i] To access a given Burrow-Wheeler character, use the following: exampleOne.sequence[(example.sa_data[i] + example.length - 1) % example.length] To free resources, use the following: freeSuffixArray(&exampleOne); There also exists EnhancedSuffixArray, which calculates the longest common prefix length (LCP) between any 2 given Suffix Array strings. An EnhancedSuffixArray much be constructed as follows: exampleTwo = makeEnhancedSuffixArray(exampleOne); To access LCP data, use the following: exampleTwo.LCPArray[i] Due to C not supporting inheritence, the SuffixArray data may be access using the following: exampleTwo.sa_struct Note that exampleTwo.sa_struct is exampleOne. In order to free an EnhancedSuffixArray, use the following: freeEnhancedSuffixArray(&exampleTwo); Note that this also frees example one as if freeSuffixArray() has been run on &exampleOne. If and only if a debug build was made, a table of a given EnhancedSuffixArray can be printed to stdout by calling the following: printSuffixArrayContainer(exampleTwo); ======================================================================== NOTES=================================================================== ======================================================================== This library does not play nicely with C++'s STL. The library will not change in because the author disagrees with the C++ standards comittee. See the below post for more. http://blog.copton.net/archives/2007/10/13/stdvector/index.html ======================================================================== AUTHORS================================================================= ======================================================================== Josh Marshall ======================================================================== Additional Credits====================================================== ======================================================================== This library closely implements the structures described in "Replacing suffix trees with enhances suffix arrays" authored by Mohamed Ibrahim Abouelhoda, Stefan Kurtz, and Enno Ohlebusch. Mohamed I. Abouelhoda, Stefan Kurtz, Enno Ohlebusch. 2003. Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms 2 (2004), 53-86. DOI:http://dx.doi.org/10.1016/S1570-8667(03)00065-0 The Suffix Array Induced Sorting was implemented from, but does not strictly conform to the implementation in "Two Efficient Algorithms for Linear Time Suffix Array COnstruction" outlined by Ge Nong, Sen Zhang, and Wai Chan. Nong, G. Zhang, S. and Chan, W. H., Two Efficient Algorithms for Linear Time Suffix Array Construction. in IEEE Transactions on Computers Volume 60, IEEE Computer Society, 1471-1484. DOI:http://dx.doi.org/10.1109/TC.2010.188
About
An effcient C suffix array implementation.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published