Attempting to build a fast, memory efficient K-Means program.
gem sources -a http://rubygems.org sudo gem install k_means
require 'rubygems' require 'k_means' @data = [[1,1], [1,2], [1,1], [1000, 1000], [500, 500]] kmeans = KMeans.new(@data, :centroids => 2) kmeans.inspect # Use kmeans.view to get hold of the un-inspected array => [[3, 4], [0, 1, 2]]
KMeans uses the Distance Measures Gem (github.com/reddavis/Distance-Measures) so we get quite a range of distance measurements.
The measurements currently available are:
euclidean_distance cosine_similarity jaccard_index jaccard_distance binary_jaccard_index binary_jaccard_distance tanimoto_coefficient
To specify a particular one to use in the KMeans algorithm, just provide it as an option:
KMeans.new(@data, :distance_measure => :jaccard_index) KMeans.new(@data, :distance_measure => :cosine_similarity) KMeans.new(@data, :distance_measure => :tanimoto_coefficient)
You get the idea…
# 1000 records with 50 dimensions data = Array.new(1000) {Array.new(50) {rand(10)}} ai4r_data = Ai4r::Data::DataSet.new(:data_items=> data) # Clustering can happen in magical ways # so lets do it over multiple times n = 5 Benchmark.bm do |x| x.report('KMeans') do n.times { KMeans.new(data) } end x.report("Ai4R") do n.times do b = Ai4r::Clusterers::KMeans.new b.build(ai4r_data, 4) end end end user system total real KMeans 15.960000 0.030000 15.990000 ( 16.062639) Ai4R 70.230000 0.180000 70.410000 ( 70.704843)
-
David Richards - For his code reviews and all round helpfulness. - github.com/davidrichards
Copyright © 2009 Red Davis. See LICENSE for details.