Skip to content

Commit

Permalink
populate gem
Browse files Browse the repository at this point in the history
  • Loading branch information
tcrouch committed Sep 22, 2017
1 parent d26789e commit f5b32b4
Show file tree
Hide file tree
Showing 27 changed files with 1,237 additions and 25 deletions.
22 changes: 22 additions & 0 deletions .rubocop.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
AllCops:
TargetRubyVersion: 2.4
Metrics/AbcSize:
Enabled: false
Metrics/CyclomaticComplexity:
Enabled: false
Metrics/MethodLength:
Enabled: false
Metrics/PerceivedComplexity:
Enabled: false
Metrics/BlockLength:
Exclude:
- "tasks/**/*.rake"

Style/StringLiterals:
EnforcedStyle: double_quotes
Layout/AlignParameters:
EnforcedStyle: with_fixed_indentation
Layout/MultilineMethodCallIndentation:
EnforcedStyle: indented
Layout/MultilineOperationIndentation:
EnforcedStyle: indented
1 change: 1 addition & 0 deletions .yardopts
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
-m markdown - LICENSE.txt
4 changes: 3 additions & 1 deletion Gemfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# frozen_string_literal: true

source "https://rubygems.org"

git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
git_source(:github) { |repo_name| "https://github.com/#{repo_name}" }

# Specify your gem's dependencies in edits.gemspec
gemspec
67 changes: 63 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Edits

Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/edits`. To experiment with that code, run `bin/console` for an interactive prompt.
A collection of edit distance algorithms in Ruby.

TODO: Delete this and the text above, and describe your gem
Includes Levenshtein, Restricted Edit (Optimal Alignment) and Damerau-Levenshtein distances, and Jaro and Jaro-Winkler similarity.

## Installation

Expand All @@ -22,7 +22,66 @@ Or install it yourself as:

## Usage

TODO: Write usage instructions here
### Levenshtein

Edit distance, accounting for deletion, addition and substitution.

```ruby
Edits::Levenshtein.distance "raked", "bakers"
# => 3
Edits::Levenshtein.distance "iota", "atom"
# => 4
Edits::Levenshtein.distance "acer", "earn"
# => 4

# Max distance
Edits::Levenshtein.distance_with_max "iota", "atom", 2
# => 2
Edits::Levenshtein.most_similar "atom", %w[tram atlas rota racer]
# => "atlas"
```

### Restricted Edit (Optimal Alignment)

Edit distance, accounting for deletion, addition, substitution and swapped
characters.

```ruby
Edits::RestrictedEdit.distance "raked", "bakers"
# => 3
Edits::RestrictedEdit.distance "iota", "atom"
# => 3
Edits::RestrictedEdit.distance "acer", "earn"
# => 4
```

### Damerau-Levenshtein

Edit distance, accounting for deletions, additions, substitution and
transposition.

```ruby
Edits::DamerauLevenshtein.distance "raked", "bakers"
# => 3
Edits::DamerauLevenshtein.distance "iota", "atom"
# => 3
Edits::DamerauLevenshtein.distance "acer", "earn"
# => 3
```

### Jaro & Jaro-Winkler

```ruby
Edits::Jaro.similarity "information", "informant"
# => 0.90235690235690236
Edits::Jaro.distance "information", "informant"
# => 0.097643097643097643

Edits::JaroWinkler.similarity "information", "informant"
# => 0.94141414141414137
Edits::JaroWinkler.distance "information", "informant"
# => 0.05858585858585863
```

## Development

Expand All @@ -32,7 +91,7 @@ To install this gem onto your local machine, run `bundle exec rake install`. To

## Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/edits.
Bug reports and pull requests are welcome on GitHub at https://github.com/tcrouch/edits.

## License

Expand Down
6 changes: 5 additions & 1 deletion Rakefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# frozen_string_literal: true

require "bundler/gem_tasks"
require "rspec/core/rake_task"

Dir["tasks/**/*.rake"].each { |t| load t }

RSpec::Core::RakeTask.new(:spec)

task :default => :spec
task default: :spec
1 change: 1 addition & 0 deletions bin/console
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#!/usr/bin/env ruby
# frozen_string_literal: true

require "bundler/setup"
require "edits"
Expand Down
25 changes: 10 additions & 15 deletions edits.gemspec
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# coding: utf-8
# frozen_string_literal: true

lib = File.expand_path("../lib", __FILE__)
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
require "edits/version"
Expand All @@ -9,21 +10,12 @@ Gem::Specification.new do |spec|
spec.authors = ["Tom Crouch"]
spec.email = ["[email protected]"]

spec.summary = %q{TODO: Write a short summary, because Rubygems requires one.}
spec.description = %q{TODO: Write a longer description or delete this line.}
spec.homepage = "TODO: Put your gem's website or public repo URL here."
spec.summary = "A collection of edit distance algorithms."
# spec.description = "TODO: Write a longer description or delete this line."
spec.homepage = "https://github.com/tcrouch/edits"
spec.license = "MIT"

# Prevent pushing this gem to RubyGems.org. To allow pushes either set the 'allowed_push_host'
# to allow pushing to a single host or delete this section to allow pushing to any host.
if spec.respond_to?(:metadata)
spec.metadata["allowed_push_host"] = "TODO: Set to 'http://mygemserver.com'"
else
raise "RubyGems 2.0 or newer is required to protect against " \
"public gem pushes."
end

spec.files = `git ls-files -z`.split("\x0").reject do |f|
spec.files = `git ls-files -z`.split("\x0").reject do |f|
f.match(%r{^(test|spec|features)/})
end
spec.bindir = "exe"
Expand All @@ -32,5 +24,8 @@ Gem::Specification.new do |spec|

spec.add_development_dependency "bundler", "~> 1.15"
spec.add_development_dependency "rake", "~> 10.0"
spec.add_development_dependency "rspec", "~> 3.0"
spec.add_development_dependency "rspec", "~> 3.6"
spec.add_development_dependency "benchmark-ips"
spec.add_development_dependency "redcarpet"
spec.add_development_dependency "yard", "~> 0.9.9"
end
10 changes: 10 additions & 0 deletions lib/edits.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
# frozen_string_literal: true

require "edits/version"

require "edits/damerau_levenshtein"
require "edits/hamming"
require "edits/jaro"
require "edits/jaro_winkler"
require "edits/levenshtein"
require "edits/restricted_edit"

# A collection of edit distance algorithms
module Edits
# Your code goes here...
end
94 changes: 94 additions & 0 deletions lib/edits/damerau_levenshtein.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# frozen_string_literal: true

module Edits
# Implemention of the Damerau/Levenshtein distance algorithm.
#
# Determines distance between two strings by counting edits, identifying:
# * Insertion
# * Deletion
# * Substitution
# * Transposition
module DamerauLevenshtein
# Calculate the Damerau/Levenshtein distance of two sequences.
#
# @example
# DamerauLevenshtein.distance("acer", "earn")
# # => 3
# @param seq1 [String, Array]
# @param seq2 [String, Array]
# @return [Integer]
def self.distance(seq1, seq2)
if seq1.length > seq2.length
temp = seq1
seq1 = seq2
seq2 = temp
end

# array of Integer codepoints outperforms String
seq1 = seq1.codepoints if seq1.is_a? String
seq2 = seq2.codepoints if seq2.is_a? String

rows = seq1.length
cols = seq2.length
return cols if rows.zero?
return rows if cols.zero?

# 'infinite' edit distance for padding cost matrix.
# Can be any value greater than max[rows, cols]
inf = rows + cols

# Initialize first two rows of cost matrix.
# The full initial state where cols=3, rows=2 (inf=5) would be:
# [[5, 5, 5, 5, 5],
# [5, 0, 1, 2, 3],
# [5, 1, 0, 0, 0],
# [5, 2, 0, 0, 0]]
matrix = [Array.new(cols + 2, inf)]
matrix << 0.upto(cols).to_a.unshift(inf)

# element => last row seen
item_history = Hash.new(0)

1.upto(rows) do |row|
# generate next row of cost matrix
new_row = Array.new(cols + 2, 0)
new_row[0] = inf
new_row[1] = row
matrix << new_row

last_match_col = 0
seq1_item = seq1[row - 1]

1.upto(cols) do |col|
seq2_item = seq2[col - 1]
last_match_row = item_history[seq2_item]

sub_cost = seq1_item == seq2_item ? 0 : 1

transposition = 1 + matrix[last_match_row][last_match_col]
transposition += row - last_match_row - 1
transposition += col - last_match_col - 1

# TODO: do insertion/deletion need to be considered when
# seq1_item == seq2_item ?
deletion = matrix[row][col + 1] + 1
insertion = matrix[row + 1][col] + 1
substitution = matrix[row][col] + sub_cost

# step cost is min of operation costs
cost = substitution < insertion ? substitution : insertion
cost = deletion if deletion < cost
cost = transposition if transposition < cost

matrix[row + 1][col + 1] = cost

last_match_col = col if sub_cost.zero?
end

item_history[seq1_item] = row
end

matrix[rows + 1][cols + 1]
end
end
end
26 changes: 26 additions & 0 deletions lib/edits/hamming.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# frozen_string_literal: true

module Edits
# @see https://en.wikipedia.org/wiki/Hamming_distance
module Hamming
# Calculate the Hamming distance between two sequences.
#
# @note A true distance metric, satisfies triangle inequality.
#
# @param seq1 [String, Array]
# @param seq2 [String, Array]
# @return [Integer] Hamming distance
def self.distance(seq1, seq2)
# if seq1.is_a?(Integer) && seq2.is_a?(Integer)
# return (seq1 ^ seq2).to_s(2).count("1")
# end

length = seq1.length < seq2.length ? seq1.length : seq2.length
diff = (seq1.length - seq2.length).abs

length.times.reduce(diff) do |distance, i|
seq1[i] == seq2[i] ? distance : distance + 1
end
end
end
end
Loading

0 comments on commit f5b32b4

Please sign in to comment.