Skip to content

Latest commit

 

History

History
95 lines (69 loc) · 3.58 KB

README.md

File metadata and controls

95 lines (69 loc) · 3.58 KB

csplogo

cmd.csp.similarity

License: MIT Maintenance GitHub release GitHub tag GitHub commits GitHub contributors

A library implementing different string similarity and distance measures for ease of use. A dozen of algorithms (including Levenshtein edit distance and sibblings, Jaro-Winkler, Longest Common Subsequence, cosine similarity etc.) are currently implemented. Used in the Cognitive Service Platform cmd.csp for NLP and classifier part.

Prerequisites

There are no prerequisites.

Included dependencies:

<dependency>
    <groupId>net.jcip</groupId>
    <artifactId>jcip-annotations</artifactId>
    <version>1.0</version>
</dependency>

Installing/Usage

To use, merge the following into your Maven POM (or the equivalent into your Gradle build script):

<repository>
  <id>github</id>
  <name>GitHub swelcker Apache Maven Packages</name>
  <url>https://maven.pkg.github.com/swelcker</url>
</repository>

<dependency>
  <groupId>cmd.csp</groupId>
  <artifactId>cspsimilarity</artifactId>
  <version>1.0.0</version>
</dependency>

Then, import cmd.csp.postagger.*;` in your application :

// Example
import cspsimilarity.*;
...
	private NormalizedLevenshtein engineNL = new NormalizedLevenshtein();
	private JaroWinkler engineJW = new JaroWinkler();
	private MetricLCS engineMLCS = new MetricLCS();
	private NGram engineNGRAM = new NGram(3);
	private Cosine engineCOSINE = new Cosine(9);
	private Jaccard engineJACARD = new Jaccard(9);
	private SorensenDice engineSOREDICE= new SorensenDice(9);
...
    String source = (sourceText);
    String search = (toSearch);

    double sS=0d;

    sS=(engineNL.similarity(source, search));
    sS=(engineJW.similarity(source, search));
    sS=(1d-engineMLCS.distance(source, search));
    sS=(1d-engineNGRAM.distance(source, search));
    sS=(engineCOSINE.similarity(source, search));
    sS=(engineJACARD.similarity(source, search));
    sS=(engineSOREDICE.similarity(source, search));

Built With

  • Maven - Dependency Management

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Authors

  • Stefan Welcker - Modifications based on tdebatty/java-string-similarity

See also the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the LICENSE.md file for details