Skip to content

anadon/libsuffixarray

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

========================================================================
DESCRIPTION=============================================================
========================================================================

This is a C library to handle suffix array construction in a
minimalistic, while still setting up all structures needed to replace
any use of suffix trees.  For more details, see USAGE.

========================================================================
NOTE
========================================================================

The implementation is now correct, but slow.  While hack on the SAIS code
the library will rely on the rather simple recursive bucket sort.  The
sort is prone to stack overflows on genomic size data.

========================================================================
BUILD AND INSTALL=======================================================
========================================================================

The standard build and install procedure is as follows:

  # make install

For debugging problems related to the suffixArray structure and
related code, verbose output is added when compiled using the following:

  $ make debug

Do be aware however, that this does not install the libraries into
system directories.  This is because this build is intended to only be
useful for those debugging issues related to this library.  Such users
are expected to be able to use, move, and link the libraries as is
appropriate.

========================================================================
USAGE===================================================================
========================================================================

To use the library, include it using the following:

  #include <suffixarray.h>

To create a usable suffix array object/structure, declare the following:

  SuffixArray example;

To initialize *example*, set it to the following:

  exampleOne = makeSuffixArray(sequence, sequenceLength);

The above uses sequence as an unsigned char*, and sequenceLength as
a size_t.  The code is implemented such that it can expand to use all
the memory in a system.

NOTE: It does not implement oppertunistic compression of small numbers
to conserve memory, and there are no plans to implement such a feature.

To access a given suffix index, use the following:

  exampleOne.sa_data[i]

To access a given Burrow-Wheeler character, use the following:

  exampleOne.sequence[(example.sa_data[i] + example.length - 1) % example.length]

To free resources, use the following:

  freeSuffixArray(&exampleOne);

There also exists EnhancedSuffixArray, which calculates the longest
common prefix length (LCP) between any 2 given Suffix Array strings.
An EnhancedSuffixArray much be constructed as follows:

  exampleTwo = makeEnhancedSuffixArray(exampleOne);

To access LCP data, use the following:

  exampleTwo.LCPArray[i]

Due to C not supporting inheritence, the SuffixArray data may be access
using the following:

  exampleTwo.sa_struct

Note that exampleTwo.sa_struct is exampleOne.

In order to free an EnhancedSuffixArray, use the following:

  freeEnhancedSuffixArray(&exampleTwo);

Note that this also frees example one as if freeSuffixArray() has been
run on &exampleOne.

If and only if a debug build was made, a table of a given
EnhancedSuffixArray can be printed to stdout by calling the following:

  printSuffixArrayContainer(exampleTwo);

========================================================================
NOTES===================================================================
========================================================================

This library does not play nicely with C++'s STL.  The library will not
change in because the author disagrees with the C++ standards comittee.
See the below post for more.
http://blog.copton.net/archives/2007/10/13/stdvector/index.html

========================================================================
AUTHORS=================================================================
========================================================================

Josh Marshall

========================================================================
Additional Credits======================================================
========================================================================

This library closely implements the structures described in "Replacing
suffix trees with enhances suffix arrays" authored by Mohamed Ibrahim
Abouelhoda, Stefan Kurtz, and Enno Ohlebusch.

Mohamed I. Abouelhoda, Stefan Kurtz, Enno Ohlebusch. 2003. Replacing
suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms
2 (2004), 53-86. DOI:http://dx.doi.org/10.1016/S1570-8667(03)00065-0

The Suffix Array Induced Sorting was implemented from, but does not
strictly conform to the implementation in "Two Efficient Algorithms for
Linear Time Suffix Array COnstruction" outlined by Ge Nong, Sen Zhang,
and Wai Chan.

Nong, G. Zhang, S. and Chan, W. H., Two Efficient Algorithms for Linear
Time Suffix Array Construction. in IEEE Transactions on Computers Volume
60, IEEE Computer Society, 1471-1484.
DOI:http://dx.doi.org/10.1109/TC.2010.188

About

An effcient C suffix array implementation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published